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PiiB  reports  are  primarily  technical  *.  While  conclusions  afs'ecfc- 
i r>i.*  winitayy  nolle??  or  p5T,s?S”io.t.s  any  s*vp»«j*  in  thss . they  wra  not 
intended  aa  a ^ cor  off;  ei'U  action,  TM.nd±v*g§  ocKS&lnsioE.e 
contained  in  PPs  reposts  a?.-?  intended  to  gel  da  the  conduct  ui:  farther 
research.  Vhso  reeess«:>  findings  suggest  raesmsandnti «»a  for  -*'4U 
wi^ist-^tive  aciJcn..,  *»»« 1».  recorcend&tloTr*  ore  made  separately  to 
the  Pjipropiiat.e  nix.it*'.  y •i£>'ney~ 
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of  pictorial  tests  of 
ah  j£hb  assbsshbk?1  of  ys^aoBsax* 

(Tfetsod  oft  PR8  Report  988) 


STATOWEM1  OP  TffSi  PHOPTJ1M 

One  of  the  most  important  problems  of  personnel  management  in  tin® 
Assay  is  identifying  (1)  men  with  qualities  of  leadership,  and  (2)  men 
who  can  readily  he  trained  as  officers  and  none emmi t; a i o as d leader a. 

V^'sV.OV* ; vT;r  ;!.'*'i  ' * vv.»-  '-'*v«v;  *'*  *•  „;r  •"  v:-  '.r. ':;*■?  "•  i •.*}>'.  a *k  ?(••  . '• 

ful  than  o there*  She  purpose  of  the  present  study  was  to  evaluate  three 
new  pictorial  tests  of  personality  as  predictors  of  leadership  ability 
at  the  11.  8»  Military  Academy  and  in  Leader 9 a Schools., 


BSSULT5 


1«  For  a gfimplc  of  privates  enrolled  at  Lender9 s Schools,  two  out 
of  three  of  the  tests  gave  a hotter  than  chance  differentiation  between 
men  ratad  high  by  their  associates  and  men  rated  low, 

2.  The  power  of  these  tests  to  distinguish  between  high  rated 
privates  and  low  rated  privates  is  about  the  same  as  a test  already 
in  usei  "the  Leaders  Self-Description  Blank."  The  new  tests  ere  not 
closely  related  to  the  old  one. 

3.  However,  for  noncommissioned  officers  at  Leader's  Schools,  no 
one  of  the  testa  differentiated  between  high  rated  men  and  low  rated 
&>*&• 


A.  Hone  of  tho  tests  gave  scores  related  te  Aptitude-fer-Servles 
Ratings  for  cadets  at  the  Military  Academy. 

COBCTiTSIOIS 

1.  The  validity  of  pictorial  tests  used  in  this  experiment  was 
insufficient  to  add  significantly  to  the  validity  attainable  with 
Self -Description  Blanks  previously  developed  by  the  Personnel  Besearch 
Section,  Personnel  Sesearoh  and.  Procedures  Branch,  The  Adjutant  General's 
Office. 

Sy  la  order  for  pie  boric!  tests  to  become  effective  leadership 
predictors,  it  appears  necessary  to  effect  improvement  in  item  content 
and  format.  Whether  such  Anprovetaent  would  be  sufficient  to  warrant 
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VOLK  SUMR&KZ 

Three  new  tests,  j^vtuzp  interpretation  Tept,  Army  Picture  Dtogy 
Teat,  and  the  Picture  f 11- in  Tact.,  were  administered  to  216  cadets  at 
the  Military  Academy  and  968  enlistee  aieii  in  Leader'-  a Schools,  In 
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4k;  associate  rating  for  anliated  jaen  ajid  y®r.i£;te&  «m 
grjjiip*  nf  H55B  s“i®ts  aid  258  enlisted  jiirau 
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Sis  is  a report  of  a research  study  initiated  in  May*  1950*  and, 
performed  under  Contract  Humber  BA-49-083  QSA,~64f  negotiated  under 
authority  of  Section  B (a)  (5)-  An t of  F«h*mry  19.  1948  (Public  law  413 
80th  Congress) „ 

Tbs  study  m conceived  and  executed  in  association  with  tbs  «taff 

tvP  n.y , ‘ i . ipv  j r • \.  ’1  r*'P?-:  .- 

ijwpsrtzeent  of  the  Army,.  Particularly  noteworthy  were  th®  contributions 
made  by  Bre*  D.  S.  Bale:-  f Ho  S.  Brogden,  H„  Perloff  * 1.  K.  Taylor,  and 
the  late  Dr.  C.  Io  Hosier. 

On  the  staff  of  the  contractor,  indi spensabla  collaboration  was 
furnished  by  a number  of  individuals  in  addition  to  those  whose  names 
appear,  somewhat  arbitrarily*  in  authorship , Among  them  are 
Dr*  Smst  0.  Beier,  Mr.  D.  K„  Sable p Kiss  Mildred  3.  Leonard, 

Mr.  Boll  and  Tongas,  and  Mrs.  Elizabeth  B.  Coleman  and  her  scoring 
staff. 


The  cooperation  of  the  authorities  at  Forts  Belvoir,  Bix, 

Jackson,  and  &nox,  and  at  the  U.  S.  Military  Academy,  notably 

Lt.  Col*  Baymond  Bonpf , HaJ.  Herman  7.  Smith,  and  Dr.  Douglas  Spencer, 

was  Invaluable  In  the  acquisition  of  data. 

To  these  individuals,  and  to  many  ethers  of  whom  space  cr 
memory  preclude  the  mention,  the  authors  express  their  gratitude. 
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A.  Problem 

I»  To  identify  those  items,  in  estsb  of  three  new  objectively  scored 
•'rejectiv*  tests,  whi^h  discriminate  between  gupsrior  and  inferior 
leader fa  among  Want  Point  cadete  and  enlisted  trainees  in  Leaders  Schools. 

• w *y  w*  wxjo  restating 

coring  Keys  for  the  ascsssmsnt  of  leadership  in  new  sample  a of  personnel. 

3*  la  compare  ths  validity  of  these  keys  with  that  of  a 'biograph- 
ical inventory  currently  used  by  the  Army  for  leadership  assessment, 

4»  To  factor  analyse  the  several  tests  found  valid  with  West 
Point  cadets  along  with  other  leadership  measures,  in  order  to  inves- 
tigate basic  personality  factors  intrinsic  to  such  measures. 

£»  Method 

1.  ‘The  Tests  ~ 

as  Picture  Interpretation  Test  - involves  elective  identic 
fic&tioa  with  individuals  depicted  in  various  roles  end  activities, 

b.  Army  Picture  Story  Test  - involves  the  ranking  of  state- 
ments with  regard  to  their  appropriateness  in  describing  each  of 
a series  of  pictures. 

c.  Picture  fill-in  Test  = entails  the  rating  of  expropriate- 
uses  of  rejoinders  in  conversational  situations  depicted  in 
cartoons. 

i d.  West  Point  Personal  Inventory  - a series  of  biographical 
and  self-descriptive  questions,  used  with  West  Point  cadets. 

e.  Leaders  Seif -Description  Blank  - a series  of  biographical 
and  self-descriptive  questions,  used  with  Leaders  School  trainees. 


20  Ths  Criteria  — * 

a.  The  West  Point  Aptitude  Eating  was  used  as  the  measure  of 
leadership  performance  of  West  Point  cadets*  This  is  a composite 
rating  on  leadership  mads  by  the  cadet's  poors  and  tactical  officer, 

b.  The  Associate  Bating-  mainly  a nomination  rating  hv  puera, 
wfc-s  employed  as  the  standard  of  leadership  performance  of  Leaders 
School  trainees. 


3„  The  three  pvo jeetive  testa  were  aumiid  stored  to  4-54  West 
Point  (**>*3*.*  nm.  959  Leaders  School  trainees  , She  Vest  Point  Personal. 
Inventory  wftrf  al**s  adMui  st-ered  to  these  cadets : Criterion  lata  were 
obtained  for  as  many  of  these  Individuals  m feasible; 

4,  Ail  items  on  the  three  projective  testa  were  bi  serially  corre- 
lated with  the  criterion,  This  wag  done  for  four  groups  of  subjects, 
as  follows i two  randomly  selected  croups  of  cadets,  numbering  21®  and 

•r:"ir|-^=  : ' '"-.Vi"'  r i 1 . I,  t f \* » •’  '•  •.  ; 4 

commissi  one  a cllicsrs  enrol  i«sci  at  Leaders  Schools. 

6.  Scoring  keys  wore  developed  from  this  analysis,  those  items 
being  keyed  which  had  criterion  correlations  minimally  significant  at 
the  104  level  of  confidence » 

6,  The  Picture  Jill-In  and  Picture  Interpretation  Tests,  and  the 
Leaders  Self-Description  Blank,  were  administered  to  a new  sample  of 
S96  privates  enrolled  at  Leaders  Schools.  Criterion  data  were  secured 
for  these  individuals.  Validity  and  reliability  statistics  were 
computed  for  this  group. 

C.  .Results  — 

X.  In  the  two  samples  of  Vest  Point  cadets,  there  was  no  better 
than  a chance  relationship  between  responses  to  the  items  on  all  three 
projective  tests  and  the  Aptitude  Eatings  received  by  the  cadets*  The 
Vest  Point  Personal  Inventory  had  a correlation  of  .35  with  the  cri- 
terion in  the  two  samples  combined. 

2.  Similar  negative  results  vers  obtained  in  the  iv«a>  analysis 
of  the  tests  against  Associate  Eatings  of  non-commissioned  officers 
enrolled  in  Leaders  Schools. 


3,  In  a sample  of  privates  enrolled  at  Leaders  Schools,  it  w*s 
possible  to  identify  in  two  of  the  three  projective  tests  an  appreciably 
Larger- than-chance  number  of  items  that  distinguished  between  the  higher- 
sad  lover-rated  men.  These  two  tests  were  the  Picture  Interpretation  and 
Picture  Pill-In  Tests. 


4.  These  teBts,  when  scared  for  the  new  sample  of  privates  by  the 
scoring  key  developed  on  the  first  group,  yielded  validity  coefficients 
of  .35  for  the  Picture  Interpretation  Test*  and  ,19  for  the  Picture 
Fill-In  Test.  The  Leaders  Self -33e«ci rip ti on  Blank  had  a validity  coeffi- 
cient of  .30  in  this  sample.  Each  of  these  coefficients  differe  signi- 
ficantly from  seru  at  the  1$  le^el  of  confidence . 


5.  The  ipU  1:  -hrl  f rsliabi  l i f-y  co  off 
Spearnian-Srovn  Prci)>',:..;y  formula,  were  =85 
*”*?8 1 and  .91.  f.ov  t\‘Z  Pjr.tnra  1 > Test 


to.*  augmented  by  the 

for  the  Picture  I nt«vp r*  tn  f on 
iv  the  c rn~  • — -au  idericr.  sample. 


t.  lm  f ctreiatj  our  of  the  two  testa 
o*ade"C  ' V:  i »»•->  on  i;i  ■ yarn  ;;I  t il- 


ch  aaori  r.triHT  vi  to  *■>«* 

po«i  fci  *r.  ■ 


;y,  0 or^sB  isioat 

X„  The  three  protective}  tests.  a»  slow  cone#  tuted,  are  of  a©  value 
for  leadership  agsevsifiost  of  Vest  point  ondeteo 


Thesu  tests  are  also  of  no  value  t or  leadership  assessment 

of  noa-comigpioaed  officers  in  Lssdei“®  Schools. 


1 .j  ■ io*  uuvu  vuo  ,tr  iiU***?  JJA^JL—Xn 

and  Picture  Interpretation  Tests  on  a sample  of  privates  in  Leaders 
Schools*  Scoring  the  fet .a  tests  for  a new  sample  of  privates  by  means 
of  these  keys  yielded  scores  which  were  significantly  correlated  with 
the  criterion  of  leadership  in  Leaders  schools. 

4,  The  Mogrephical  inventories  (Vest  Point  Personal  Inventory 
and  Leaders  Self-D© scrip tioa  Blank)  showed  significant  criterion  corre- 
lations in  their  respective  sssplcs. 

6.  Among  the  private^  the  two  valid  projective  tests  did  not 
add  fqjprweiably  to  the  predictivo  power  of  the  biographical  inventory 
when  combined  with  it  in  a multiple  regression  equation.  Bone  the  leas  { 
their  correlations  with  the  inventory  are  low  (about  .35),  as  is  their 
correlation  with  one  another  (.18). 

6.  It  is  Inferred  that  the  Picture  Pill-In  Test  and  the  Picture 
Interrelation  Test  show  considerable  promise  as  techniques  for  leader- 
ship assessment , although  Improvements  are  needed  to  translate  this 
promise  into  a state  of  practical  utility.  Suggestions  are  made  mani- 
fest in  this  study  as  to  how  improvements  nay  bs  effected  in  regard  to? 
(1)  power  to  discriminate  more  accurately  between  superior  and  inferior 
leaders,  and  (8)  extending  the  range  of  personnel  with  whom  such  tests 
would  be  useful. 

7.  In  view  of  the  lack  of  validity  of  the  projective  tests  among 
West  Point  cadets?  it  was  not  meaningful  to  proceed  with  the  factor 
analysis  designed  to  reveal  the  basic  personality  factors  common  to 
these  and  other  measures  of  leaderships  so  that  this  objective  of  the 
study  could  not  be  achieved.. 


X o XHftfifflfflCSIGK 


I'be  identification  af  ***.«•.  with  high  po ten  tAaii  ti «*?  as  Isadern  it? 
VuiUv-.i^iondahiy  a waiter  of  prime  ijapox  lance  to  t-h&  Anas?.  IccsotdJMly. 
a soasidsiahlb  cuaounr.  of  research  hay  bssa  doas  or  ^powirecL  oy  the 
Ixev  on  teehaiuaas  for  accomplishing  such  identification. 


hi  tyu.—rrU  3 f.  * *•« 


S 4-4  4 ^ V> 


sian8*  leader ship  performance. 


swill  leave  much  to  Tjo  desired  in 


-inipvrtcuce  in  uetonnining  & 
sstho&r?  for  measuring  such  factors 
the  way  of  validity  and  accuracy,  la  recent  years,  the  evidence  hr§ 
grown.  iuyre  suggestive  that  projective  tests1  nay  have  prospiiee  along 
these  lines.  However,  these  tests  are  typically  time  consuming  to 
administer  and  score,  and  typically  require  trained  psychologists  for 
their  interpretation.  These  characteristics  are  manifestly  unsuited 
for  large-scale  military  classification  purposes. 


To  circumvent  these  deficiencies.  The  Personnel  Research  Section 
of  the  Adjutant  General's  Office  undertook  the  preparation  of  several 
testa  which  are  fundamentally  projective  in  nature  hut  which  are  amen- 
able to  group  administration  and  objective  (even  machine)  scoring.  When 
any  new  test  is  constructed,  the  questions  of  its  validity  and  what  it 
measures  immediately  arise.  These  questions  become  even  more  urgent 
when  the  test  represents  a radically  new  departure.  Thus,  in  the  case 
of  the  new  objective  projective  tests,  not  only  are  their  particular 
validities  unknown,  but  also  subject  to  question  are  the  issues  of  the 
general  fruitfulness  of  the  approach  and  of  the  underlying  psychological 
dimensions  measured  by  such  techniques. 

The  research  described  in  this  report  was  undertaken  in  an  effort 
to  shed  light  on  these  questions. 

II.  0Bj*3TX?I8 

More  specifically,  the  objectives  of  this  research  may  'be 
described  as  follows; 


A*  To  ascertain  the  validity  of  each  of  three  objective  projective 
teats  for  measuring  leadership  performance  of  Army  commissioned  personnel. 


1.  To  determine  the  correlation  of  aach  item  with  a 
criterion  of  leadership  performance,  on  the  basis  of  which  to 
develop  s.  scoring  key  for  each  test- 


A projective  t-sst  requires  the  ermines  to  interpret  or  structure  ® 
aticsU-luss  wltuiitipii  wu*  ~h  lends  itself  to  s variety  oi  w^anix^is,  and 
thereby  to  reveal  aspects  of  his  personali  ty 


2.  To  SBce'ctaiii  for  each  ox  the  so  scoring  keys  its  .wiie- 
bility  and  validity  against  a criterion  of  leadership  performance 
€-t-  t-lic  loTsjl  of  oorniui asioned  personnel. 
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of  leadership  performance,  on,  the  ‘basis  ox  which  to  develop  a 
scoring  key  for  each  test. 


?.  To  ascertain  for  each  of  these  scoring  keys  its  relia- 
bili ty  and  validity  against  a criterion  of  leadership  performance 
at  the  level  of  non-commissioned  personnel. 

3.  To  compare  tne  relative  validities  of  these  tests  with 
on©  another,  and  with  a self-description  questionnaire. 

C.  From  these  data,  to  infer  the  general  promise  of  this  type 
of  test,  and  to  deduce  indications  of  which  lines  of  future  develop- 
ment seem  most  fruitful. 


It  was  also  hoped  originally  to  factor  analyse  the  relationships 
among  these  tests,  together  with  other  personality  measures  Includ- 
ing ratings  and,  behavior  measures,  with  the  objective  of  determining 
basic  personality  factors  gauged  by  such  variables.  Since  the  non— 
test  variables  were  more  appropriate  and  available  in  the  commissioned, 
personnel  situation  {west  Point);  the  plea  was  to  perform  this 
analysis  in  connection  with,  the  data  obtained  from  that  sample. 
However.  it-  was  discovered  in  the  course  of  the  research  that  the 
projective  test*  worn  virtually  uncorrelated  with  the  leadership 
criterion  in  this  situation,  thus  making  the  planned  analysis 
pointless. 
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In  accordance  with  these  standards,  cadets  in  the  appear  olesewe 
of  the  Uni  tad  States  Military  Academy  at  Vest  Point  were  chosen  as 
the  simple  whose  characteristics  and  activities  were  spproxiss&tsly 
representative  of  personnel  to  he  assessed  for  potential  leadership 
at  the  coE^isaioned  level. 


Students  at  Leadership  Schools  were  selected  as  suitable  for 
representing  potential  leaders  at  the  non-commissioned  level.  This 
group  ia  actually  composed  of  two  subgroup s,  as  regards  age,  back- 
ground, and  previous  experiences  privates  and  non-comnd s cloned 
officers.  It  was  deemed  advisable  to  investigate  separately  the 
validity  of  the  testa  for  each  of  the  two  subgroups. 


Thus,  there  were  three  categories  of  pereonnel  who  were  the 
subjects  of  the  investigations  Vest  Point  cadets,  privates  assigned 
to  Leadership  Schools,  and  non-commissioncd  officers  assigned  to 
Leadership  Schools, 

fhe  research  design  involved  the  following  steps  for  each  of  the 
categories  of  personnels 

1.  Administering  the  three  objective  projective  tests  to  samples 
of  the  personnel, 

2.  Collection  of  criterion  data  for  these  individuals. 

3.  Correlation  of  the  test  items  against  the  criterion- 

4.  Development  of  a scoring  key  for  each  test, 

Application  of  the  key  to  the  test  results  of  new  samples 
of  personnel, 

5.  Correlation  of  the  scores  on  each  test  with  the  criterion 
of  leadership  performance. 


In  the  remainder  of  this  chapter,  the  tests  will  first-  be  des- 
cribed, fallowed  by  a description,  for  the  cadet  officers,  of  the  sam- 
ples, criterion,  procedure  for  collecting  date.,  and  methods  o.f  analysing 

ths  data.  Finally,  the  sr»«e  rubrics  ai  imoi-mation  will  Is  nreaentecL 
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The  following  tents  were  included  In  the  validation.  »tudy® 
1*  i^gtuye  Intern re tail oa  Sest.  1948..  (I»A  A$0  FBf  - l?r£) 


This  433  item  test  consists  of  a series  of  <368  picture#,  so®®  of  which 
present  Individuals  participating  la  wilitary  sstivit-iee  and  sf-har^. 

* , O e ^ ^ if  ^ ^ _ i>  <>  t r v > _ --  - « ". 

* .•VlCfCSJV'  .-:  ,t  :<•<■=  Vl  .-•■*.  •.  " *.  r:‘  '•«•  i'-ll  gtiUr  ,.;«*»  uiittvUUUH 

indicate  that  the  test  is  a measure  of  interests,  although  it  may  he 
considered  a projective  instrument  to  the  extent  that  the  examinee 
tends  to  identify  with  the  situations  and  Individuals  illustrated  in 
the  pictures. 


Instructions  for  the  first  six  parts  of  the  test  follow  the  same 
general  pattern.  For  the  individuals  or  situations  presented  in  the 
pictures  in  each  part  of  the  test,  the  examinee  is  required  to  choose 
between  two  alternative  reactions,  as  follower 


(1)  Part  I 

(a)  *Yos.  1 would  like  to  do  what  he  is  doing,9  or 

(b)  ag0,  I would  not  like  to  do  what  he  is  doing.9 

(8)  Part  II 

(a)  9 Yes.  I would  like  to  be  that  parson,"  or 

(b)  aHo.  I would  not  like  to  be  that  person,* 

(3)  Part  III 

(a)  "Ififl,  this  person  is  like  me,*  or 

(b)  *3fo.  this  person  is  not  like  me.* 

A \ -r; i 

V —}  ftu.  b X V 

(a)  "XfiJlo  I would  admire  this  person, 8 or 

(b)  "Ho.  I would  not  admire  this  person.* 

(b)  Part  Y 

(a)  sXg.S9  I a®  good  at  doing  what  this  peraoa  is  doing,*  or 

(b)  *£o,  I em  not  good  at  doing  what  this  person  is  doing.* 

(6)  Part  VI 

(a)  "Yog  I like  what  is  shown  in  this  picture.*  si 

(b)  uKqr  1 do  not,  like  what  1b  shown  in  this  picture, r 
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*]££&»  the  pietur©  Made  bib  think  of  this  idea* 8 or 
sIfe?  the  piw'.tore  did  not  make  me  think  of  this  idea.* 

fit,  Army  Cloture  £togx  3teet<,  Seriea.  hp  1950.  Syracuse  University 
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The  Array  Picture  Story  Test  is  an  objective  testc  "based  on  the  general 


pictures.  The  pictures  included  in  the  Army  Picture  Story  Test  involve 
"both  military  and  non-military  situations  and  are  not  the  pictures  used 
in  the  Thematic  Apperception  Test.  For  each  picture,,  there  are  thirty 
items  presented  in  groups  of  three.  The  items  arc  relatively  short 
statements  which  are  descriptive  of  the  picture.  The  examinee  Is 
instructed  to  read  tbs  statements  within  each  triad  and  to  select  two 
statements;  the  most  descriptive  and  the  least  descriptive. 

The  statements  used  in  this  test  were  obtained  "by  administering 
the  set  of  ten  pictures  to  a large  group  of  soldiers  in  a free  response 
situation.  The  descriptions  written  hy  this  group  were  edited  and 
arranged  in  triads  on  the  basis  of  their  frequency  of  occurrence  and 
with  respect  to  a number  of  clinical  categories.  That  is,  triads  were 
composed  of  items  which  were  approximately  equal  in  frequency  of  occur- 
rence but  which  dealt  with  different  personality  needs. 

3.  mte*  1§js£»  gecoM  zm.  mg  im  asp  pbt-i756) 


The  Picture  Pill-In  Test  is  an  adaptation  of  the  Sosenzweig  Picture- 
Frustration  Test.  It  differs  from  the  Bosenzweig  test  in  that  the 
responses  are  obtained  ia  objective  form.  A series  of  43  esrtoon- 
like  pictures  is  presented,  comprising  a total  e£  -392  i In  sash 

picture,  one  individual  is  represented  as  saying  something  to  another 
individual.  Some  of  the  pictures  deal  with  military  situations,  while 
24  pictures  were  taken  directly  from  the  Eosenssweig  test.  In  an 
experimental,  administration  of  the  Preliminary  Form  of  the  Picture 
Fill-In  Test,  the  examinees  wrote  responses  in  the  cartoon  balloons* 
Besponses  made  most  frequently  by  this  experimental  group  were  selected 
for  each  of  the  pictures.  Certain  responses  which  seemed  to  be  parti- 


cularly revealing  or  measuring  important  factors  also  were  Included, 
regardless  of  their  frequency  of  occurrence.  From  seven  to  ten 
response®  wars  selected*  and  are  presented  below  each  picture  in  the 
Second  Form  of  the  teBt.  This  form,  which  was  used  in  the  present 
investigation,  was  developed  so  that  it  would  be  suitable  for  objec- 
tive scoring  j.n  the  following  manner?  The  ins  tract  ions  require  that 


the  examinee  rate  each  of  the  responses  presented  with  the  pictures 
with  respect  to  how  likely  it,  is  that  the  person  shown  would  give 
that  jrjerponsw.  This  rating  of  each  response  is  accompli -rued.  m~>  the 
following  ree  x-  .int  rsc-riss 


*Xft  ‘t  tc  e»y  resething  like  ?hi  g .,  s 

} tjggr  .'likely  to  say  «sw»«t-M:Qg  like  this* 11 


**  *§££  i^3£!££l  ISEgfiiSSE*  3J4§  (D&  AGO  PBT-175&) 

(,4i.sc-  referred  fee  aw  *&)%□  gglf-&;'mc7dT»iion  .Bleak.  rorp  II.  1849, 
&A-  160  rsh-lyid.  and  in  previous  progress  wporti  as  Biographical 
AjrlGteSv-.feioa  Blank.  B0T3  edition.) 
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test  does  not;  sake  uae  of  -pio torial  material.  She  items  are  la  the 
form  of  statcmontr*  concerning  various  characteristic#,,  a#  follows* 
Section  I includes  pairs  of  statements  dealing  with  personal  charac- 
teristics; the  individual  is  instructed  to  select  the  statement  in 
each  pair  that  is  the  best  description  of  him.  In  Section  12,  the 
individual  make#  a choice  between  each  of  two  activities  as  to  which 
he  believes  he  can  do  better,.  Statements  dealing  with  likes  and 
dislikes  are  presented  in  Section  III.  and.  the  individual  egsia. 
selects  the  statement  in  each  pair  that  he  likes  the  better.  Section  IV 
contains  statements  describing  personal  characteristics,  likes  and  dis- 
likes, abilities,  and  beliefs.  For  each  statement,  the  individual  indi- 
cates whether  the  statement  applies  to  him  or  does  not  apply. 


6r  Leaders9  Self-De sc rintion  Blank.  £org  1951.  Syracuse 

2s£vsr§i$£  tosj.. 


The  Leaders9  Self -Description  Blank  is  a 342-item  version  of  the 
Biographical  Information  Blank  and  was  used  at  the  non-commissioned 
level  at  the  Loaders9  Schools  in  the  present  investigation.  It  is 
similar  in  composition  to  ths  West  Point  Personal  Inventory,  but  the 
exact  content  of  ths  items  is  different.  Liks  the  West  Point  Personal 
Inventory,  it  does  not  present  pictorial  material  and  consists  of  four 
sections. 


Section  I cents! ns  pairs  of  statements  dealing  «i th  personal 
characteristics.  The  examinee  chooses  the  statsss^nt  from  each  pair 
that  describes  him  better.  The  pairs  of  statements  in  Section  IX 
describe  various  activities,  and  the  individual  selects  the  activity 
which  he  can  do  better.  Pairs  of  statement  are  presented  in  Section  III 
dealing  wit.h  likes  and  f&slikss,  and  the  instructions  require  select- 
ing the  statement  tliat  you  like  better.  Personal  characteristics.,  likes 
and  dislikes,  abilities,  and  beliefs  make  up  the  content  of  Section  IV, 
and  ths  examinee  is  instructed  to  indicate  whether  each  statement 
supplies  to  him  or  does  not  apply. 
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1.  _ Sp#n>le 

ike  first  and.  s scoad  clauses  of  cadets  at  the  0.  g* 
Military  Acadetiy3  West  Point*  in  July,  1950?  were  the  su’bjects  oi1 * 3 * * *  the 
investigation.  A total  of  464  cadets  r;as  tested. 

Available  cases  from  thin  sample  were  Xatey  diyi ded  i»tf»  tvre- 

analyela*  Thene  subgroups  comprised,  respsc tively,  213  and  223  eases. 

2®  feats* 

The  tests  employed  with  this  sample  were; 

a.  Picture  Interpretation  2g&£*  1949,  (DA  AGO  FBT-1775) 

b.  £l£jesxs.  -Stag  IfiJik*  telsa  & J££fi.  Sarraim 
University  Press 

e.  Picture  Jill-In  feet.  Second  ism*  1949. 

(DA  AGO  PBf-1786) 

d.  Seat  Point  Personal  ISZSSXSSL*  A§49  (DA  AGO  P£f-1756) 
3,  Criterion 

The  Aptitude  for  the  Service  System/*  was  ascertained  for 
each  cadet  for  use  as  a criterion  measure  of  leadership.  The  Aptitude 
for  the  Service  System  is  used  at  West  Point  for  the  purpose  of  pro- 
viding an  accurate  evaluation  of  the  leadership  effectiveness  of  cadets. 
The  Aptitude  Sating  is  a composite  measure  including  the  pooled  opinion 
of  the  cadet8  s Tactical  Of  fleer  and  a small  group  of  classaatss  within 
Ms  Gnsipaay.  The  evaluation  Ids  classmates  is  accomplished  through 
an  associate  (’buddy)  rating  preeedurc* 

Bach  cadet  is  ranked  in  order  of  merit  by  his  Tactical  Officer 
end  by  the  cadets  in  his  Company  in  regard  to  the  following  definition 
of  leadership i 

81  The  criterion  of  my  appraisal  i«  *sch  e®det!s  ability  (if  or 
when  placed  in  command  of  a group)  to  elicit  the  group’s  maximum  coop- 
eration? aedat&in  the  highest  possible  standards  of  administration  and 


1 A description  of  the  tests  used  in  the  study  is  presented  in 

Section  111,  k-. 

% A detailed  description  of  the  Aptitude  for  the  Sexy  lea  System  may  be 

found  in  "The  Operation  pud  Administration  of  tfeo  Aptitude  for  the 

Service  System,  IL  EiM.d, s . West  Point*-,  jffew  forkt  United  States 

Military  Academy t 1961. 
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4.  Procedure 


a.  Tost  Administration 

On  June  30  and  July  1,  I960*  the  four  tests  were  admin- 
istered in  group  situations  to  216  cadets  of  the  first  and  second 
classes  at  the  U.  S.  Military  Academy,  West  Point.  The  test  battery 
was  divided  into  two  sessions,  two  of  the  tests  being  administered 
in  the  first  session  and  the  other  two  tests  being  given  in  the 
second  session.  Each  session  required  about  three  hours  of  testing 
time;  A similar  procedure  was  utilized  when  the  second  group  of  Vest 
Point  cadets  was  tested  on  July  27,  28  and  29,  1950.  This  second 
group  of  cadets  numbered  238  and  were  from  the  first  and  second 
classes. 


b*  Collection  of  criterion  data  end  eosgtitution  of 
criterion  groups. 

Criterion  data,  entered  on  Hollerith  cards,  were 
received  from  the  Vest  Point  statistical  office.  These  cards  con- 
tained the  cadet  serial  numbs?,  the  mean  Aptitude  Hating  based  on  the 
first  term  and  the  second  term  of  the  second  class.  Aptitude  Eatings 
for  both  terms,  and  year  of  expected  graduation.  The  criterion,  of 
leadership  effectiveness  utilized  in  this  investigation  with  the  Vest 
Point  sample  was  the  ms  an  Aptitude  Hating  which  rajwuax-izeia  the  cadet*  s 
leadership  performance  during  his  second  class. 

The  total  sample  was  divided  randomly  on  the  basis  of  serial 
numbers,  group  A being  composed  of  those  cadets  with  ever,  serial  num- 
bers, sad  group  B having  odd  serial  numbers.  As  a check  on  the 
randomness  of  this  procedure,  £ and  T statistics  were  computed  between 
the  mean  Aptitude  Index  criterion  scores  of  the  two  groups?  this 
analysis  indicated  that  the  two  groups  may  be  considered  as  random 
samples  from  the  same  population  in  regard  to  the  leadership  criterion. 
The  p-uiposs  of  fractio sizing  the  sample  in  this  manner  was  to  make  it 
possible  to  ps-forai  a double  cross-validation  on  the  scoring  keys  de- 
rived in  the  itew  analyses. 
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estimated  by  computing  the  Msari&l  nor relation  coefficient  between 
d.i.fixijj  i-cjffii  &&R  xv$  spouse®  sr»5  the  ^ptituds  Bs ti ng  criterion,. 


computet! oil  of  the  item  validities  was  facilitated  "by  making  use  of 
xhe  Xolbe  aadJSdgerton  table  fo?.*  pstiiaattug  bigwris(j  correlation 


The  Aptitude  Hating  criterion  was  normalised  by  dividing  the 
distribution  into  equal  frequency  eighths,  and  assigning  the  standard 
score  equivalent  of  the  mid-point  of  each  eighth  in  a normal  distribu- 
tion to  each  criterion  score  within  that  eighth.  Thus,  all  eases 
railing  in  a given  eighth  of  the  obtained  distribution  of  criterion 
scores  received  the  same  standard  score  equivalent. 

tfhils  the  same  general  item  analysis  procedure  was  followed  for 
the  west  Point  study,  somewhat  different  techniques  of  dichotomizing  the 
item  responses  were  necessary  for  the  different  beets,  as  follows* 


(1)  Picture  Interpretation  Test,  1949  (BjL  AGO  PET-1775) 

The  item  responses  in  this  teat  fit  a natural  dichotomy  since  the 
examinee  is  instructed  to  indicate  either  "Yes*  or  "Ho"  for  each  item. 
Thus,  there  is  no  problem  in  dichotomising  the  responses  for  the  pur- 
poses of  the  bi serial  correlation  type  of  item  analysis. 


(2)  AijUjj!.  Picture  Story  Test.  Series  Bb  I960.  Syracuse 
5aiZ§I8iJy  gyesBo  As  described  in  Section  III,  A,  Tests,  the  Army 
Picture  Story  Test  requires  that  the  individual  choose  the  most  des- 
criptive and  the  least  descriptive  statements  from  groups  of  three 
i terns . Vi  thin  the  triad,  the  item  that  is  considered  to  he  most 
descriptive  is  marked  a,  while  the  item  that  seems  to  be  least  des- 
cxlptive  is  marked  B,  and  the  intermediate  item  is  not  marked.  Per 
purposes  of  obtaining  item  frequencies,  I.B.H.  gr^hic  item  counts 
were  made  for  each  item,  for  the  A or  B alternatives.  The  trichoto- 


moue  alternatives  for  each  item  were  dichotomised  in  order  to  e$>ply 
the  biserlal  correlation  item  analysis  technique;  in  doing  this,  the 
extreme  alternative  («h9Sts  or  "Worsts  having  the  larger  frequency 
oi  response  was  usetd  as  one  category  of  the  dichotomy,  while  the 
combination  of  the  other  extreme  with  the  intermediate 
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woas  «j.tUii©u  wie  oilier  category.  This  arrangement  was  use 
to  yield  the  closest  approximation  to  a 50^50^  dichotomy 
maximising  the  stability  of  the  resulting  item  validity  o; 


U9«. 


thus 

efficients. 


K.oilie,  L.  > and  fi&gsrton,  He  A,. , Table  for  0 Gamut  leg  Bi  bo  rial 
£*  „Ha?^J£duc.  s 1936  , 4,  245-251* 


Voters  £l J.l- <.u  [vf 
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r.Kif  •lAffi?)  „ Tki»  best  rcquifCii  I0.iy.tj  i»iiG  xjudlv iuuox  ran©  escih  itesu 
«*i  " t-hxSS-gwiSlt  SCOXo  in  fugutd  to  «hw  define  OX  !lj  V*»*i  howl  itMM-r, 

the  item  ia  an  appropriate  statement*  as  ©plained  is  Section  III,  B» 
58st;Sr.  Wia  fH«hct?3?r?  required  by  MsarAsl  tte  anaiyais  »s  achieved 
by  owMnias  the  B end  G response;,,.  Thu©,  the  frequencies*  o:f  response* 
vara  obtained  fen  th?  A cntc'pry  vs.  the  combined  L end  G Oiioev*-;. 

The  basis  for  grouping  the  B and  0 responses  rather  than  utilising 
eoae  other  saabia&ticn.  In  eider  to  uiuhotonize  the  responses  was  noth 


to  believe  that  B (i's  likely  to  say  something  like  this)  is  closer  on 
a continuum  to  C (Is  very  likely  to  say  something  like  this).  More- 
over, an  inspection  cf  the  item  responses  indicated  that  "by  using  the 
dichotomy  of  A vs.  B and  G,  the  ideal  5C#-50^  dichotomy  was  more 
closely  epproximated. 


(4)  Vest  Point  Personal  Inventory.  1949  (DA  AGO  FST-1756) . 
An  item  analysis  of  this  test  was  not  necessary  since  the  scoring  key 
had  already  boss,  developed  in  a previous  study  and  was  made  available 
by  the  Personnel  Besearch  Section,  AGO,  for  the  validation  phase  of 
the  present  investigation. 


b.  Pattern  Item  Analysis 


Since  the  items  of  the  Amy  Picture  Story  Test  are  grouped 
in  triad b 0 it  was  hypothesized  that  the  pattern  of  responses  might  be 
significant.  In  order  to  investigate  this  hypothesis,  the  following 
pattern  analysis  was  performed;  For  each  triad,  six  patterns  of  res- 
ponses are  possible.  For  Group  A of  the  Vest  Point  sample,  frequency 
counts  were  mad©  of  the  responses  to  each  of  the  six  patterns  for  each 
of  the  100  triads.  A level  of  significance  test  based  on  X®  was  mad.© 
among  the  frequencies  of  the  patterns  for  each  triad,  contrasting 
upper  and  lower  criterion  groups®  This  procedure  made  it  possible  to 
estimate  the  validities  of  the  pattern  responses. 


c.  Cross-Validation 


In  general,  the  validities  cf  the  tests  were  estimated  by 
computijag  Pearson  product-moment  correlation  coefficients  between 

the  KuOiiiAe:  keys  derived  ■emd  the  Aptitude  Bating  leadership  criterion. 

In  fallowing  this  procedure,  the  two  samples,  group  A (eves  serial 
numbers)  and  group  B (odd  serial  numbers)  were  treated  separately,  in 
order  that  the  scoring  key  derived  on  group  A could  be  crossed  over 
and  validated  on  Group  B.  and  the  scoring  key  obtained  on  group  £ could 
he  validated  on  group  A.  This  double  cross-validation  techniaus  makes 
uso  of  the  principle  of  replication  in  determining  which  items  ars 
consistently  valid  in  both  samples  and  permits  two  .minimum  estimates 
ox  the  validity  of  tbs 
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X ..  Sample £ 

fio  Item  Analysis  Group 


ri?he  enlisted  personnel  samples  in  this  3 1 tidy  ^ere 
drawn  from  Army  Leader's  Schools  whose  mission  is  to  train 
personnel  for  leadership,  primarily  at  the  non-commissioned 

Av'";1  ;-r-'T*  X^V-'V'  _ I :.  Vr,\  • . - :.  .'•  S i i.  i,i  • ' . — ’ :' 

consist  of  privates  who  have  just  completed  basic  training  and 
who  have  been  recommended  by  their  company  officers  as  evidencing 
leadership  potential.  This  source  of  students  is  called  "pipe- 
line". Other  sources  have  included  reenlistments  and  national 
Guard  personnel  called  up  for  active  duty  as  a result  of  the 
Korean  conflict.  lor  these  groins,  training  at  the  Leader's 
Schools  is  considered  a refresher  course.  One  other  important 
category  includes  officer  candidates  who,  at  present  are 
required  to  complete  a leadership  course  before  attending  Officer 
Candidate  Schools.  Leaders  Schools  at  It.  Dir,  1.  J.,  and 
It.  Knox.  Ky. , which  train  soldiers  from  ground  force  unite,  were 
visited  to  gather  the  data  for  item  analysis.  956  men  were  tested 
at  these  two  installations. 

The  sample  was  divided  into  two  subgroups:  privates  (inelud- 

ing  privates  first  class)  and  non-commissioned  officers.  These 
groups  differ  in  average  ago  and  military  background,  factors 
which  might  affect  performance  on  the  tests  and  criteria.  Hence, 
it  was  considered  desirable  tc  perform  separate  item  analyses  and 
validations  for  the  two  subsamples. 

b.  Cross-Validation  Group 

Leaders  Schools  at  It.  Jackson,  S.  C.,  and  It. 
Belvoir,  Va. , were  visited  to  secure  the  data  for  cross-validation. 
These  schools  train  personnel  from  infantry  and  engineering  molts,, 
respectively.  368  cases  were  utilised  for  the  cross-validation 
results. 


a.  Item  Analysis  Group 

At  It.  Dix  and  It.  j%hox,  the  EistfauaL  Aatenaate&Qa 
Test.  1949  (gAABO  IIiT-1775) . Picture  Fill-In  fleet.  Second  logs. 

. a»d  AfflE  fixture  Story  fejg.t,  Sjjr£ui  B, 
1950.  Syracuse  Pnivf  rsi tv  Pres 3,  were  administered. 


UroaB-Yaliaatiaa  Group 


On  the  Das  lei  of  the  item  analysis  performed,  it  was 
decided  to  saLuiaieier  only  the  Picture  Interpreted  or*  Test  d«u  ths 
Picture  fill-in  Test  to  the  ei?oa^valid«tiofi  sSEple*  Xu  addition 
at  the  request  of  P2Sg  the  Leader?  b Self-Deacri-gtion  Blank  a Sarloa  g., 
1951,,  Syracuse  Uni  vers!  fey  press,  was  administered  to  t-Me  seme  group. 


M w nf  th.'.  >T;5'>P’'-"e  *»7  rvj»  i/Vy;  'inC-'v  "' 
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Criteria 


a.  Item  Analysis  Group 

The  training  cycle  at  Leader1®  Schools  is  divided  into 
two  four-week  phases*  Phase  I and  Phase  II*  Training  during  Phase  I 
is  primarily  academic  in  nature*  whereas  Phase  II  consists  primarily 
of  practical  leadership  experience  in,  field  situations,  hurl ng  the 
training  program  * soldiers  are  periodically  «valu&t*d  for  thsir  per- 
formance on  different  criteria  by  various  kinds  of  raters,  i.©«,  both 
commissioned  and  non-commissioned  cadre  as  well  as  by  their  peer?. 

All  soldiers  tested  in  this  study  were  in  Phase  I of  the  training 
cycle  at  the  time  of  testing. 

The  following  criterion  measures  of  leadership  were  obtained 
for  the  group  tested  at  It.  Dix  and  Ft.  Enn-r?  Pnoulw  hoard  Bating* 
Associate  Rating,  Leaders1  Reaction  Test,  Rating  of  Phase  II  Perfojv 
manee,  and  Total  Rating  (a  weighted  combination  of  the  foregoing). 

Intereorrelutions  among  the  above  criteria  were  computed  for  a 
BBmple  of  the  soldiers  tested  at  Ft.  Dix  and  Ft,  Knox.  These 
statistics  are  useful  for  estimating  the  extent  to  which  the  criteria 
measure  different  aspects  of  leadership.  The  following  table  chows 
theae  results.  (See  Table  1). 

In  both  samples r,  it  seems  evident  that  the  various  criteria  are 
somewhat  unrelated  to  each  other.  Although  the  Associate  Rating 
also  appears  to  be  somewhat  different  from  other  ratings  uf  loader- 
ship  potential*  on  the  basis  of  its  recommendation  by  Personnel 
Research  Section  for  use  in  this  study*  it  would  seem  to  1?$  the  aost 
appropriate  measure  of  leadership  available.  Results  from  other 
Personnel  Research  Section  studies1  had  shown  associate  ratings  to  be 
superior  measures  of  leadership.  In  the  present  study,  furthermore* 
Associate  Ratings  were  available  for  mors  subjects  tested  than  any  of 
the  other  ratings. 


■ Robert  H>  «ad  Fryer,  iiouglas  H. , "Buddy  Ratines?  Popularity 

Contest  or  lieadoreMp  Grifceria?*,  Personnel  Psychology.  1943, 

147-153 . 


lafcsreorrelatxonis  eaten*;  Various  Criteria  for  iim  Unlisted. 
Bamplen  * 


Table  1, 
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.24 

All  of  the  above  considerations  led  to  the  decision  to  use  Associate 
Eatings  as  the  criterion  for  the  enlisted  sample  of  this  study* 

Since  this  criterion  was  adopted,  it  will  be  valuable  to  describe 
in  grtsiig»hat  greater  detail  the  operations  by  which  scores  on  this  measure 
were  obtained  for  the  samples.  Each  student  is  evaluated  by  his  fellow 
students  at  the  end  of  Phase  I training,,  The  students  are  each  given  a 
"Student  Leadership  Evaluation  Hep ort- Eating  Sheet".  On  this  sheet  is 
a roster  of  the  men  in  the  student® § group,  customarily  numbering  from 
nine  to  fifteen  men.  The  student,  from  this  roster,  chooses  those  who® 
he  thinks  the  three  best  leaders  and  the  three  poorest  leaders.  On  the 
next  day.  each  student  is  given  "Student  Leadership  Evaluation  Hsport- 
Lescripiica  Sheet”.  On  this  sheet  are  printed  the  names  of  the  men  is 
the  group*  Share  are  algo  tea  pairs  of  descriptive  statements.  For 
each  man  on  the  roster,  the  student  is  to  choose  the  description  in 
each  pair  of  statements  which  most  appropriately  describes  the  man  being 
rated.  These  sheets  are  then  scored  by  using  the  keys  furnished  by 
Personnel  He  search  Section,  One  score  is  based  on  the  nominating  tech- 
nique, weights  being  given  to  the  number  of  nominations  received*  The 
morn  nominations  a soldier  receives  which  are  indicative  of  better 
leadership,,  the  higher  his  Associate  Rating  score*  The  other  score  is 


:t  s 


H&k dvsd  i weights  based  on  ampix-i^aL  avidauae  &*  to  wM«h  d®s- 

crip  ti  one  are  more  characteristic  of  better  leaders,  The  scores  from 
the  rating  sheets  and  description  sheets  are  averaged , These  scores 
constituted  the  Associate  Bating  criterion  used  for  item  analysis 


b®  Groes-Talidation  Q-roup 


Associate  Eatings  vs  re  also  obtained  for  the  soldiers 

peri od  from  about  January  through.  March;  "952,  Test  scores  were 
obtained  by  us®  of  the  keys  developed  from  the  i ten  analysis  group. 
These  scores  were  correlated  with  the  Associate  Batingg. 

4„  Procedure 

a.  Item  Analysis  Sroup 

Trainees  in  Phase  I at  It . Mss  were  tested  in  August 
and  October  of  1950  and  January  1951.  The  Picture  Interpretation, 
Picture  Jlll-In,  and  Army  Picture  Story  Tests  were  administered  to 
groups  ranging  in  sis#  from  approximately  50  to  100  trainees.  Ivory 
effort  was  made  to  elicit  the  cooperation  of  the  soldiers  tested, 
including  some  explanation  of  the  purpose  of  the  study.  A total  of 
480  subjects  in  Phase  I was  tested  at  It.  Dix.  In  field  trips  made 
during  October  1950  and  January  1951,  473  subjects  in  Phase  I were 
tested  at  It.  Knox.  Thus,  the  total  number  of  trainees  tested  for 
the  item  analysis  group  was  958. 

b„  Oross-Tslidation  Group 

Trainees  in  Phase  X at  It.  Jackson,  S.  0.,  and 
It.  Y— „ v vc re  ve.tou  in  y uiuiqj*  conui  doufl 

similar  to  those  obtaining  for  the  item  analysis  group.  On  the 
basi  s of  the  results  of  the  1 tea  analysis,  it  was  decided  to  admin** 
ieter  only  the  Picture  Interpretation  and  Picture  Iill-In  tests  to 
these  groups.  An  additional  test  was  administered  at  the  request  of 
Personnel  Eesearck  Section,  the  Leader !s  Self -Description  Blank.  At 
It.  Jackson,  tho  number  of  privates  vhoae  test  papers  rsxe  adequately 
filxod  out  was  156 ° non-coffimiesioned  officers  numbered  50.  At 
It.  Belvoir,  test  papers  from  143  privates  were  acceptable;  the  non- 
commissioned  officer  sample  numbered  9,  The  total  number  for  all 
three  tests  Included  309  privates  and  59  non-commissioned  officers » 

Table  B summarises  the  number  of  cases  used. 


•fribls  Os  dumber  of  Gaa&fc  iu  llem-Auelyul#  and  Oro  ss-Y&li da ti ow  bauples. 
A*  Number  of  Oases  Xn  fcho  li-aa  Analysis  €ksei?« 


Ft.  m& 


Privates 


246 


S>U 


Ft.  Knox 

139 

xac 


^oial 

386 

vm 


B.  Number  of  Cases  in  the  Cross-Validation  Group 

,____ Ft.  Jackson  * j » Belvoir Total 

Privates  166  143  309 

Non-coms  60  9 69 


(5.  Analysis 

a.  Item  Analysis  Group 

Tor  the  purpose  of  item  analyzing  the  three  tests  used, 
it  we is  desired  to  combine  the  installation  samples,  since  the  result- 
ing keys  would  he  used  irrespective  of  installation.  The  following 
analyses  were  performed  in  order  to  determine  the  most  appropriate 
statistical  method  for  combining  the  Associate  Bating  scores  from  the 
two  installations. 

Critical  ratios  were  computed  comparing  mean  Associate  Sating 
scores  obtained  by  soldiers  at  ft.  Biz  with  those  at  Ft.  Knox.  The 
differences  between  installations  were  significant  at  the  Vf>  level  of 
confidence. 

Variance  ratios  were  computed  for  these  data  to  test  the  signifi- 
cance of  differences  in  variability  for  Associate  Bating  ieo»«, 
Differences  in  variance  were  significant  at  the  3$  level  of  confidence 
for  non— ccsi s at  Vnnv  versus  non-soss  at  Diz.  This  significant  differ- 
ence in  variability  makes  ambiguous  the  interpretation  of  tests  of 
significance  for  mean  differences  reported,  sine©  significant  critical 
ratios  between  means  may  arise  because  of  differences  in  variability. 

The  following  tables  present  data  fro®  which  the  above  interpre- 
tations were  made. 


Table  3»  Critical  Ratios  end  Variance  Katies  ios*  Seating  Differences 
iS.  -fA0*3e5A  Aggueiat©  Rating  Suorey 

A.  Privates 


fort  Six  fort  Knox 


Mean 

Standard  Deviation 
Critical  Satis 
Variance  Hatio 


79.16 

4.79 


14,6*** 

1.42* 


72.44 

4.02 


3.  lon-commissioned  Officers 


H 

90 

137 

Mean 

80.44 

72.30 

Standard  Deviation 
Critical  'Ratio 
Variance  Batio 

3.94 

13*C* 

1.68 

5.12 

***  Significant  at  the  1$  level  of  confidence 
**  Significant  at  the  2f>  level  of  confidence 
* Significant  at  the  10$  level  of  confidence 


On  the  baeis  of  the  preceding  aaalvses5  the  best  procedure  for 
combining  the  two  installations  seemed  to  he  conversion  of  Associate 
Bating  raw  scores  to  standard  scores  within  each  installation  before 
pooling  the  two.  Although  about  950  men  had  been  tested  on  each  of  the 
three  experimental  tests  administered,  attrition  in  the  number  of  cases 
had  occurred  as  a result  of  improperly  answered  tests  and  also  by  the 
inability  to  secure  criterion  measures  on  some  of  the  subjects,  The 
graphic  item  counts  are  baaed  on  a sample  of  385  privates  and  228  non- 
commissioned officers.  From  the  graphic  item  counts,  biserial  rss 
were  computed  for  each  Item  of  each  of  the  three  tests  &daini stored. 

The  method  by  which  these  were  computed  was  anslagous  to  that  used  for 
the  csds  v officer  sieuepi.©. 

The  Associate  Bating  criterion  was  normalized  by  dividing  the 
distribution  into  equal  frequency  eighths,  and  assigning  the  standard 
score  equivalent  of  the  midpoint  of  each  eighth  is  a normal  distribu- 
tion to  the  criterion  ecorss  within  that  eighth* 

Graphic  item  counts  for  the  three  projective  test©  vers  obtained 
separately  for  the  samples  of  privates  and  non-coxsd  g?r<  cued  officers. 
Isch  of  these  two  samples  had  been  fraction! sad  lute  -right  av.bsamptee 
o£  equal  frequency?  after  first  arranging  the  c&»«e  in  a-sscending 
erder  accord! xw;  tc  tha4  t Associate  Rairf  ng  str.ndsrf. 


While  tha  same;  general  item  analysis  procedure  wb  followed  for 
the  three  tests,  aomsw hat  different  tschniqnes  for  dichotomizing  the 
item  responses  were  necessary  for  the  different  teste,  $or  the  method 
of  dichotomising  responses  used  for  the  different  tests 8 se©  Section  ®„ 
part  5.  binder  cadet  officers® 


Scoring  keys  were  developed  for  those  teste  with  promising  valid- 
ity based,  on  the  item  arelysis  results,,  Items  were  selected  whose 
M serial  r6  s were  significant  at  tha  10$  level  of  confidence.  The 
value  t;?  hi  serial  r necessary  fw-  *-■???•  from 

of  liisoird-i  a uumpuvwu.  from  the  roll owing  formtala^: 


SI 


his 


Significant  M serial  r1  a are  a function  of  the  percentage  of  cases  in 
each  dichotomized  group,  as  well  as  the  confidence  level  adopted.  Tor 
a 50$  dichotomous  split,  an  r of  .105  was  required  for  significance 
at  the  10$  level  of  confidence  in  the  private *s  sample.  3Por  the  non- 
commissioned sample,  an  r ox  .13?  was  required  for  significance  at 
this  level  of  confidence. 


One  key  was  developed  for  the  Picture  Interpretation  Test  in 
addition  to  those  obtained  as  above.  This  test  was  selected  for 
special  study  in  an  effort  to  discover  the  nature  of  those  items  which 
yielded  significant  bi serial  r’s.  Two  judges  classified  the  signifi- 
cant items  from  this  test  into  13  categories  suggested  by  the  kinds  of 
pictures  which  produced  significant  responses.  Those  non- significant 
items  whose  content  fitted  the  classification  scheme  adopted  wire  also 
placed  into  these  categories.  It  was  reasoned  that  if  the  classifications 
used  were  indicative  of  real  relationships  between  item  content  and 
criterion,  non-significant  items  in  the  same  classification  might  show 
correlations  with  the  criterion  having  the  same  direction  as  that  found 
j'or  the  statistically  significant  items  classified  in  the  same  category. 


To  check  on  the  extent  to  which  non-significant  items  were  pre- 
dicted with  correct  signs  for  the  various  categories,  the  following 
analysis  was  performed.  The  proportions  of  positive  and  negative 
biaerial  r5a  among  all  non- significant  items  clRBaified  were  determined. 
Likewise  the  proportion  of  positive  or  negative  items  allocated  to  each 
category  was  determined.  The  differences  between  proportions  in  each 
category  and  in  the  total  were  then  tea  bad  for  significance.  If  thes'i 
differences  were  significant,  it  was  concluded  that  the  classification 
of  items;  within  these  categories  was  meaningful.  By  this-  procedure*  ?m 
itsn  clessJ.fi  call  on  key  of  83  items  wee  developed..  Tfrr  item -analysis 
hay  for  chi  a tost  contained  39  items,. 
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■bece.ua©  they  lacked  Associate  Bating  scores  as  a result  or  ksv i»g  been 
dropped,  for  various  reasons,  from  the  Leader1  a (tours©.  Thera  were 
PS  such  cases*  who  xi&u  complete  sets  of  testa.  Mean  swore*  were 
obtained  for  this  group  on  the  Picture  Interpretation  Seat  and  tfc@ 
Picture  Fill-In  rest  by  scoring  their  r.anarn  with  the  item  analysis 
keys  devslepsa  as  described  previously. 


It  was  desired  to  compare  the  mean  score  for  dropouts  to  the  mean 
score  of  the  total  item  analysis  group.  It  was  postulated  that,  if  a 

out  mean  scores  ©n  the  tests  would  he  significantly  lower  than  the 
means  of  the  item  analysis  group.  Shis  assumes  that  dropouts,  had 
they  hean  rated,  would  have  received  relatively  low  Associate  Bating: 
scores. 


In  order  to  estimate  the  mean  test  scores  of  the  item  analysis 
group,  it  was  necessary  to  score  their  papers  with  the  item  analysis 
keys.  Bather  than  scoring  test  papers  for  the  entire  sample  of  586 
privates,  5G  cases  were  selected  at  random  from  this  group.  A 
stratified  random  sampling  technique  was  used  since  the  385  privates 
had  been  fractionated  into  eighths  on  the  basis  of  Associate  Batinge 
fcr  item  analysis  purposes  (see  page  9 ).  Furthermore,  for  both  this 
group  and  the  dropouts,  a different  keying  of  responses  was  used  to 
simplify  scoring  than  that  used  later  for  the  cross-validation  sample. 
Since  the  keying  of  items  is  arbitrary,  the  two  keying  methods  used 
result  in  scores  which  differ  only  by  a constant. 


For  testing  the  significance  of  the  differences  between  means  of 
the  dropout  and  item  analysis  group,  £ tests  were  computed.  The 
standard  error  for  the  difference  between  means  was  adjusted  in  the 
i formula  to  take  account  of  the  use  of  a stratified  random  sampled 


b.  Cross-Validation  Broup 


Using  the  item-analysis  keyB,  the  Picture  Fill-In  and  Picture 
Interpretation  Tests  were  scored  for  the  cross-validation  sample.  The 
Picture  Interpretation  Test  was  also  scored  for  the  item  classification 
key  described  above.  The  leader's  Self -Description  Blank  was  scored 
by  using  the  key  furnished  by  the  Personnel  Beseareh  Section. 

Esliability  coefficients  were  computed  for  the  Picture  Interpre- 
tation and  picture  Fill-In  tests.  The  method  of  computation  used  was 
the  correlation  between  scores  from  o&d-and-even  numbered  items, 
augmented  by  the  Spsarmen-Brown  propjassy  formula  to  estimate  the 
reliability  of  the  whole  teat. 


1 KcEemar,  % . * Psychological  Statistics.  John  Wiley  and  Sons,  Inc.; 

Pew  York,  1949,  pp  = 3S&-S36, 


Validity  coefficients  were  computed  for  the  eresw-v&iieiation 
sample-  Scores  on  each  of  the  three  tests,  obtained  as  described 
above,  were  correlated  with  the  Associate  Hating  criterion*  The 
correlations  were  computed  separately  for  the  ft*  cTacketm  and  Ft*. 

Belvoir  samples,  and  for  the  two  combinad. 

Before  cjomhiniag  the  two  installations  into  a total  sample*  ii 
was  advisable  to  test  whether  mean  Associate  Hating  scores  for  the 
two  installations  differed  significeuatiy.  A critical  ratio  was  con** 

rfiV rK'l  t-l*n  i-\  :$v\k  i-lvi.r1  »/  ; . . 

not  statistically  significant,  Associate  Bating  scores  would  not  be 
converted  to  standard  scores  for  computation  of  the  validity  coeffi- 
cients* 

To  test  whether  the  validity  coefficients  differed  significantly 
for  the  two  installations,  critical  ratios  were  computed  for  the 
difference  between  two  sample  correlation  coefficients.  An  r to  s 
transformation  was  made  prior  to  this  statistical  test. 

i 

Two  multiple  correlation  coefficients  were  computed,  by  the 
Vhe rry-Bo  o little  method,  with  Associate  Eatings  as  the  criterion 
variable  in  both  cases,  and  as  the  predictor  variables  (l)  Picture 
Interpretation  Test  and  Picture  Pill-In  Test,  and  (2)  Picture 
Interpretation  Test,  Picture  Pill-In  Test,  and  Leaders'  Self -description 
Blank.  Only  those  trainees  who  had  completed  all  three  tests  and  had 
received  an  Associate  Bating  were  utilised  for  this  analysis. 


iVo  sbsults 


Aa  Bseults  with  Cadet  Officers 


Picture  Interpretation  Teat,  1949  (D 
ft.  Item  Analysis^ 
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validities  failed  to  exceed  chance  expectancy,  indicating  a lack  of 
validity  for  the  test  with  this  sample.  A ee&ttesploi  was  prepared 
shoving  the  relationship  of  the  obtained  validity  coefficients  of 
Croup  A (E  s 211)  vs.  the  corresponding  coefficients  of  Group  B 
(E  = 323) . The  correlation  between  the  tvo  sets  of  validity  coeffi- 
cients was  approximately  zero,  indicating  little  consistency  of  item 
validity  from  nnh-sample  to  sub-sample.  This  finding,  together  with 
the  low  proportion  of  significantly  valid  items,  strongly  suggests 
that-  the  test  does  not  possess  sufficient  validity  for  the  prediction 
of  leadership  with  West  Point  cadets. 


A qualitative  investigation  of  those  items  for  which  combined 
validities  exceeded  chance  expectancy  did  not  yield  logical  categories 
or  trends  which  were  considered  to  be  psychologically  meaningful. 

Table  4 shows  the  distribution  of  the  item  validities  in  the 
Picture  Interpretation  Test. 


b.  Cross-Validation 


To  substantiate  the  evidence  from  the  item  analysis,  the 
validity  of  the  Picture  Interpretation  Test  was  estimated  by  computing 
the  correlation  coefficient  between  the  scoring  keys  derived  on  the 
item  analysis  samples  and  the  Aptitude  Sating  leadership  criterion. 

For  Group  A (E  = 210)  the  scoring  key  yielded  a correlation  coefficient 
of  .12.  For  Group  B (E  - 222)  the  correlation  coefficient  was  found 
to  be  .07,  indicating  the  lack  of  appreciable  validity  of  this  test  for 
these  samples. 


Item  validities  have  been  reported  in  detail  for  each  of  the  tests  ia 
tables  included  in  regular  monthly  progress  reports  submitted  to  the 
Department  vf  thu  Army  during  the  course  of  the  study.  Slightly 
different  IPs  from  sample  to  sample  and  from  teat  to  test  are  the 
result  of  incomplete  data  on  a few  Cunn  s in.  the  sample  tested. 


Table  4*  Distribution  of  Item  T&LidifcisB  in  the  Picture  Intuit- iati on 
T««t  for  Two  West  Point  Staples . 
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.20-. 24 

16 

.25—. 29 

11 
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11 
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4 
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2 

• 35— . 39 

1 

.36-. 39 

0 

Total 

429  items* 

.40-. 44 

0 

. 45— , 49 

1 

Total 

43g  items 

* The  total  masher  of  items  for  which  it  was  possible  to  compute 
validity  coefficients  wag  439  is  Group  A*  since  three  items  (no.  184, 
344,  and  363)  yielded  no  responses  in  one  category  of  the  dichotomy. 


P.s  Atpv  Picture  St  ary  *j?ast.  Series  B,  1950,  Syracuse  Ifcd  veral  fey 
pjrass. 

a*  Item  Analysis 

The  item  analysis  of  tb.e  Army  Picture  Story  $2  M w t»'W  . 
only  a chance  proportion  of  statistically  significant  items.  The 
scatterplot  between  the  validity  coefficients  in  Group  A (If  - 213)  and 
Group  B (K  s 222)  for  the  Army  Picture  Story  Teat  Indicated  a near  *«ro 

raiggsntdw/.;  ii'St&r.  y fv:'  'l^r- ' Xh, -.w, 

as  in  the  case  of  the  Picture  Interpretation  Test,  the  subjects  in  the 
tvc  samples,  Group  A and  Group  B,  did  respond  similarly  tc  Individual 
items  indicating  a marked  degree  of  inter-sampl©  consistency . 

A qualitative  analysis  of  the  eignif leant  items  of  the  Army  Picture 
Story  Test  also  failed  to  disclose  meaningful  categories  or  trends  In 
terms  of  postulated  leadership  characteristics. 

Tabl"  5 shows  the  distribution  of  item  validities  in  the  Army 
Picture  Story  Test. 

b.  Pattern  Item  Analysis1 

Of  the  600  possible  patterns  of  response  in  the  Army 
Picture  Story  Test,  30$  of  the  patterns  showed  significantly  high 
criterion  relationships  at  the  30$  level  of  confidence  in  Group  A 
(w  - 213).  The  percentage  of  significant  patterns  may  not  be  beyond 
chance  expectations  because  of  inter-pattern  correlation.  However, 
the  degree  to  which  the  relationships  among  patterns  affect  the 
number  of  patterns  appearing  to  possess  significant  validities  is 
impossible  to  determine «,  Thus,  a scoring  key  was  constructed  on  the 
basis  of  the  significant  patterns  by  assigning  a weight  of  <-l  to 
patterns  with  positive  validity  at  the  30$  level  of  confidence,  and 
-1  to  patterns  with  negative  validity  at  this  level. 

e.  Cross-Validation 


The  validity  of  the  item  analysis  keys  of  the  Army 
Picture  Story  Test  was  estimated  by  calculating  the  Pearacn  product- 
jauiueut  ooiitjltxvion  coefficient  between  test  scores  and  the  Aptitude 
Bating  leadership  criterion.  Group  A (B  - 310)  and  Group  B (H  - ?452) 
yielded  validity  coefficients  of  -*.,08  find  -.05  respectively;  indicat- 
ing essentirJLiy  zero  validity  for  the  test  with  these  samples. 


1 


Valid!  tics  of  each  p&ifcsrn  have  h;:on  reported  *.a  regular  i- 
report-  oU'W  tt-fl  ';•*  hi  sc  Department  e>  tb-c  -"•ur-ing  the 
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fabla  b»  1)1  h ti»-i luifei on  of  a tea*  Validities  in  the  A%nj  Jdctuxs  Story 
Seat  for  Two  West  Point  Sables,, 
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Kfeiug  the  key  derived  ‘by  the  patten  analysis  un  (4roup  .4,  fth* 
scores  of  ©roup  .1  (If  « P23)  vers  calculated,  the  Pearson  pre&nei- 
wMBt  correlation  coefficient  between  the  test  score®  nad  the 
Aptitude  Bating  leadership  criterion  was  .01,  indicating  the  leek  of 
validity  in  the  pattern  key  for  thi~  sample. 

3.  Picture  3111-la  Test,  Second,  gorm.  1949  (DA  AGO  PIT-1725) 
a.  Item  Analyst m 

The  proportion  of  statistically  significant  items  failed 
to  exceed  chance  expectancy.  The  scatterplot  between  the  correlation 
coefficients  for  Group  A (I  = 213}  end  Group  B (V  - 323)  indicated 
ftpproxia&tsly  a mro  relationship  for  the  Picture  3111-In  Test.  Thus, 
again  negative  evidence  was  found  in  regard  to  the  Inter-sample  con- 
sistency of  item  validity.  As  in  the  oase  of  the  tvo  tests  mentioned 
previously,  a qualitative,  analysis  of  the  significant  items  found  in 
the  two  samples  failed  to  reveal  categories  of  responses  which  seemed 
to  he  psychologically  meaningful. 

However,  there  was  evidence  of  considerable  consistency  between 
samples  In  the  proportion  of  individuals  vbo  responded  In  the  same 
way  tc  the  Items  in  the  test.  This  same  indication  of  the  inter- sample 
consistency  of  the  responses  was  also  found  for  the  tiro  tests  discussed 
previously:  The  Picture  Interpretation  Test  and  the  Army  Picture  Story 
Test. 


The  following  table  presents  the  distribution  of  item  validities: 
( See  Table  6) . 

b.  Cross- Yelid&tion 

Estimates  of  the  validity  of  the  Picture  3111-In  Test 
were  obtained  by  computing  the  Pearson  product-moment  correlation 
coefficients  between  the  item  analysis  keys  and  the  Aptitude  Bating 
criterion  of  leadership.  The  validity  of  the  test  for  the  lest  Point 
samples  was  found  to  be  approximately  aero:  in  Group  A (H  - 210)  the 
validity  coefficient  was  .04  and  in  Group  B (H  - 822)  the  validity 
coefficient  was  .03. 

4.  lest  Point  Personal  Inventory.  19^9  (DA  AGO  PBT-1756) 
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oi  £»«uii  Validities  in  ike  Picture  rill  la 
Vast  .for  ®*o  Vest  Point  Semples. 
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the  West  Point  sample.  The  Pearson  produet-aoiaent  correlation  coeffi- 
cient "between  the  scores  on  this  test  and  the  Aptitude  «£v Ing  — —S—srsnip 
ritezdon  was  . 551  f "based  on  a sample  of  436  cadets.  Shis  value  of  .351 
5 a statistically  significant  validity  coefficient,  since  a value  oi 
.128  is  required  for  signifiermee  at  the  l'%  level  of  confident's. 
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fables  8,  9,  end  10  show  the  distribution  of  M serial 
r*s  obtained  for  i teas  In  each  test,  She  sign  of  the  coefficients 
«as  disregarded  in  tabulating  these  results,  since  keying  the  ires- 


fable  7 snnstarlzes  the  number  of  significant  items  found  for  the 
two  samples  at  the  10$  level  of  confidence  in  each  of  the  three  tests. 

fable  7.  Humber  of  Significant  Items  for  Three  Ssperimental 
Personality  Tests. 


Total  He.  of  Items  Humber  of 

Significant  Items 


Picture  Interpretation  Test 
1.  Privates  (H  « 385) 

432 

99 

2.  Son  Com#  (N  - 228) 

432 

53 

Picture  Fill-In  Test 
1,  Privates 

392 

130 

2.  Non-Coms 

392 

38 

Picture  Story  Test 
1.  Privates 

300 

48 

2.  Non-Coms 

300 

45 

jrroa  this  table,  it  ie  apparent  that  sore  sigsiificnat  Items  are 
found  for  the  sample  of  privates  than  is  true  for  non-commissioned 
officers.  By  Inspecting  the  individual  i tarns,  it  is  also  apparent 
that  those  items  found  significant  in  the  private’s  sample  generally 
are  not  found  significant  in  the  non-commissionod  officer  sasplo. 

The  results  indicats,  furthermore,  that  the  Picture  Fill-In  Test  nun 


Picture  Interpretation  Test  are  functioning  ia  the  sample  of  privates 
at  a level  appreciably  bettor'  than,  chance  expectancy.  Sines  tho  10$ 
level  of  confidence  was  adopted,  chance,  on  tho  average,  would  result 
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42  for  the  Picture  interpretation  feet-  and  30  for  the  At%»y  Picture 
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Table  10*  Distribution  of  Item  Validities  is  the  Picture  Pill-la  Test 
for  Two  Enlisted  Samples. 
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Tor  t*fe©  Army  Story  2?©§t,  the  Mumbsr  of  sigsifloaat  Items 

was  about  1„6  times  what  would  be  expected  on  the  'basis  of  chance* 
This  figure  it*  nob  large  enough  to  wareaufc  concluding  that  non-sliane e 
relationships  are  involved,  particularly  since  the  assumption  of  non- 
correlation  smeng  itema  i v untariRhle,,  It  is  possible  that  & pattern 
analysis  slight  reveal  more  convincing  evidence  for  the  validity  of 
this  test*  in  view  of  the  triad  form  of  item  responses,,  Ssparlence 
with  the  pattern  analysis  performed  for  the  cadet  officer  sample  did 
not  encourage  a parallel  analysis  for  the  enlisted  sample*  In  view 


At 


mr* 
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test*  it  was  not  administered  to  the  c rose-validation  sample, 


Tor  the  Picture  Interpretation  Tests  shout  2.3  time®  as  many 
items  were  found  significant  for  the  sample  of  privates  as  would 
he  expected  on  the  basis  of  chance,  for  the  Picture  Pill-In  Test, 
this  figure  was  about  3.3.  These  tests  were  selected  to  bu  adminis- 
tered to  the  cross-validation  sample,  since  the  evidence  is  suggestive 
of  validity. 

b„  Comparison  of  Item-Analysis  Croup  and  Dropouts 

Table  11  shows  £ tests  for  the  significance  of  thd  mean 
differences  between  dropouts  and  the  item  analysis  group.  It  had 
been  expected  that  dropouts  would  show  lower  mean  test  scores  for  the 
Picture  Pill-In  Test  and  for  the  Picture  Interpretation  Test.  Such 
is  the  case,  and  furthermore,  this  difference  is  significant  at  the 
&jk  level  of  confidence  for  the  Picture  Interpretation  Test,  and  at 
the  1#  level  of  confidence  for  the  Picture  Jlll-In  Test. 

2.  Cross-Validation 


a.  fieliabilities  and  Belated  Statistics 


Table  12  summarises  reliabilities  and  related  statistics 
calculated  fro^  the  cross-validation  sample  for  the  Picture  Ji  11-In 
Test  and  the  Picture  Interpretation  Test. 


Thane  results  indicate  that  score?  for  these  instruments  are 
sufficiently  reliable  to  bs  useful  for  large-scale  classification 
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fable  12.  Reliabilities  and  Belated  Statistics  Estimated  from  Cross- 
Validation  Samples  at  Leaders9  Schools  for  the  Picture 
Interpretation  and  Picture  Pill-In  feats 
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Differences  in  seen,  criterion  scores  for  the  two  iustallatlm-w 
were  not  statist, ically  significant  at  the  b%  level  of  confidence. 

On  the  basis  of  this  result,  validity  coefficients  were  computed 

u^.w.;  l': *'00?.. at.:  t;..'  : 

standard  scores. 


Tests  of  the  significance  of  the  difference  between  sample 
correlation  coefficients  were  performed  in  or  dor  to  compare  validity 
coefficients  at  the  two  installations.  These  results  fall  to  reveal 
a difference,  significant  at  the  6$  level  of  confidence,  between  the 
validity  coefficients  at  the  two  installations.  The  tests  sees  to  be 
about  equally  valid  in  both  of  the  cross-validation  samples. 

(3)  Item  Classification  Key  for  the  Picture  Interpretation 

Test. 


Table  14  presents  validity  coefficients  for  the  item 
classification  key,  for  the  item  analysis  key,  for  the  combination 
of  the  two,  and  the  correlation  between  the  item  analysis  and  item 
classification  keys.  The  criterion  used  for  the  validity  coeffi- 
cients was  that  of  Associate  Eatings. 

The  fact  that  the  item  classification  key  correlates  signifi- 
cantly with  the  item  analysis  key  is  interpreted  to  mean  that  these 
categories  hive  an  appreciable  degree  of  internal  consistency.  The 
failure  of  the  items  classified  to  correlate  significantly  with  the 
criterion  is  sa  indication  of  their  consistent  lack  of  validity  for 
this  criterion  even  In  a cross-validation  sample. 

(3)  Multiple  Correlation 

Table  15  shows  the  multiple  correlations  for  the  total 
cross-validation  sample,  and  the  intercorrelationg  from  which  the 
E'b  wore  computed, 

The  standard  errors  of  these  R's  are  such  ag  to  render  no  combi- 
nation  of  the  teste  ^preole.biy  superior  in  prediction  to  another, 
xior,  indeed  to  the  leaders  Self-Description  Blank  alone. 
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fha  conclusions  to  derived  from  the  results  described  is\  the 

ptfSOSdiSg  s»s»C- uj  C-S.  jsl hw  ^MbU«V4»'«.  T«ic»W  Ail  vlie  SP?[UiHO(f  COWS apOIlG.  - 
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i«  V clldi  «y  si  ullG  thr«S  otjsc  uive  piujeKs  bi  *©  v^D  ttii  ilUl  MSSlSS*" 
u?i!V»  XeedersMij  performance  of  West  Po*  Hi;  ftedolwu 


1.  In  nong  of  the  three  tests  does  the  MMM^tlgn  g£  Items 
correlate  with  the  leadership  criterion  Uutitude  Bating)  appreciably 
'beyond  chance  ©apec-tation.  This  follows  from  the  fact  that  the  number 
of  items  found  statistically  significant  at  a glysn  level  of  confidence 
is  no  greater  than  the  number  which  would  manifest  that  degree  of 
validity  through  sampling  fluctuations  about  a true  validity  of  zero. 

3.  When  scoring  keys  are  developed  independently  for  each 
of  two  representative  samples  by  keying  items  whose  individual  valid- 
ity coefficients  appear  to  be  statistically  significant,  there  ia 
practically  no  better  than  chance  correspondence  ia  the  items  comprised 
within  the  two  keys.  Hence,  it  can  be  inferred  that  there  is  lnado- 
SM &£  consistency  g£  IfigXlRg  kfffa  tom  saarple  jig.  iSSBlft"  ZatfJlfij:- 
BSISL,  each  of  the  two  keys  for  oach  test  correlates  approadLaiately  aero 
&&  itoa.  lafitofttea  cslfietoa  ia  iia  cri^-vaildation.  smsXs.- 

3.  Duplicating  previous  findings  of  the  Personnel  He  search 
Section,  the  West  Point  Personal  Inventory  is  found  to  ooreolato 
mgfig3L53ai£  Md  significantly  lilfe  JEM  leadership  ££ii££i£&.  In 
addition  to  demonstrating  the  validity  of  the  test,  this  indicates 
that  j&c  criterion  is  u red! o table. 


Discus  si  on.)!  This  investigation  failed  to  reveal  a stable  sad 
valid  method  of  keying  the  responses  to  three  objective  projective 
tests  so  as  to  predict  leadership  among  West  Point  cadets  with  better- 
than- chance  efficiency.  This  failure  cannot  be  attributed  totally  to 
inadequacy  of  the  leadership  criterion,  the  Aptitude  Eating,  for  it 
is  predictable  from  scores  on  the  I'agt,  Point  Personal  Inventory. 

3.  Validity  of  throe  objective  projective  tests  for  measuring 
leadership  performance  of  Army  noa-coarai  a signed  parsonual. 
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Picture  gill-In  Tests,  are  applied  in  new  samples  of  privates  enrolled 
in  Leaders8  Schools,  the  Spearman-Bro^n  reliabilities  of  the  scores 
are  .85  and  .91,  respectively. 

The  correlation  of  these  score®  with  the  Associate  Bating 
criterion  yields  a validity  coefficient  of  .25  for  the  Picture 
i*4fiSSE£5«Mfla  2SUL&  a&d  .19  for  the  EisisiZfi  JEIlicIa  JS^aJls  these 
validity  coefficients  are  significant  at  beyond  the  Vf>  level  of  confi- 
dence. 


These  keys  also  discriminate,  on  the  average,  between  trainees 
who,  for  various  reasons  (including  h~X  of  leadership  potential), 
are  separated  early  in  the  program  and  those  who  are  graduated* 

b,  (Ho  cross-validation  see  undertaken  with  wra-commissioned 
officers,  in  view  of  the  negative  results  of  the  item  analysis.) 

3.  a.  The  Leaders 8 Self -be  script  ion  TttumV-  iB  found  to  corre- 
late appreciably  end  significantly  with  the  Associate  Hating  criterion 
for  privates.  The  validity  coefficients  of  this  and  the  other  two 
tests  do  not  differ  from  one  another  at  the  5$  level  of  confidence. 

b.  The  multiple  correlation  between  the  criterion  and  the 
two  projective  tests  is  not  appreciably  higher  than  the  validity 
coefficient  of  the  Picture  Interpretation  Test  alone.  The  multiple 
correlation  between  the  cri tsrion  and  tfc©  two  projective  teste  plus 
the  Leaders11  Self -Description  Black  is  not  appreciably  higher  than 
the  validity  coefficient  of  the  last-named  test  alone. 


c.  The  Picture  Interpretation  and  Pic  tire  Pill-la  Testro  are 
virtually  uncorrelated  with  one  another.  This,  iu  the  light  of  their 
relatively  high,  reliabilities,  suggests  considerable  independence  of 
the  factors  measured,  Both  of  the  tests  correlate  somewhat  mors  highly 
with  the  Leaders-  Self -Description  Blank,  although  still  at  a level  far 


below  their  reliability  eociffiolentSc 


Discussion-  'Ohs  Agny  picture  Story.  Test  revealed  evidence  of 
v?’’5 idity  in  neither  the  sample  of  non-ccma-iosloned  officer?-  aor  of 
privates o The  Picture  Interpretation  xec-t  and  the  Picture  Fill-In 
g!sg.$  evidenced  no  validity  in  the  non-commissioned  officer  semple  but 
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gfce.  f&ot  that  these  taatB  correlate  significant!:/  and  appawcicKly 
llife  a XlffijJM  s^ltejlafi.  q£.  leadership,  pexfomancft  g&os&  julzsitt*  aa& 
their  relative  independence  of  one  another  and  the  Leaders1  Self- 
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Instrument  that  may  constitute  an  Important  contribution  to  j,he  tech- 
nlqueg  for  Identifying  potential  leaders.  However,  in  the  light  of 
the  results,  this  statement  must  he  made  vith  certain  limitations. 

In  the  first  place , the  failure  of  the  tests  to  function  with  samples 
of  cadets  and  non-commissioned  officers  Indicates  that  they  may  he 
west  effective  vith  personnel  of  relatively  little  sophistication  and 
limited  experience.  Secondly,  it  should  be  recalled  that  the  predic- 
tive value  of  the  tests  has  been  shown  only  for  the  Associate  Bating 
criterion;  this  criterion  probably  does  not  reflect  all  aspects  of 
leadership  performance* 

Conversely,  the  positive  results  that  have  been  obtained  with 
these  measures  by  no  mean®  define  the  limits  of  their  value.  They 
mjttyB  for  example,  correlate  as  well  or  better  with  other  aspect  a of 
Away  leadership  performance,  or  with  similar  criteria  in  other  typee 
of  leadership  situations,  such  as  under  field  conditions,  fheir 
validity  for  other  types  of  performance  involving  personality  factors, 
a.g. . adimtability  to  stress  situations,  is  likewise  in  the  realm  of 
possibility.  At  present,  these  are.  of  course,  moot  points.  But  the 
promise  evidenced  by  these  tests  for  the  limited  criterion  and  si tm- 
hi  one  involved  in  this  study  imply  the  advisability  of  aacplozi&g  their 
value  for  other  criteria  and  situations. 


Sue h further  work  should,  horsvsr. 
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present  tests.  Many  items  which  failed  to  show  validity  in  this  study 
could  wall  be  eliminated,  and  “replaced  by  other  items  constructed 
along  lines  which  the  present  study  suggests  arc  more  fruitful.  Sug- 
gestions for  such  future  modifications  are  incorporated  in  the  follow- 
ing section.. 
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kor  owe  thing,  the  evidence  iu  that  the  teste,  &a  presently 
constituted,  e.re  valid  on3,y  within  a limited  segment  of  personnel  in 
connection  with  whom  leadership  assessment®  would  he  wade,  namely, 
the  youngest  and  least  experienced  groups  Is  behooves  us  to  consider 
why  this  should  be  the  case  and  what  way  be  done  to  augment  the  rang® 
of  utility  of  these  instruments. 

For  another  things  the  evidence  does  not  suggest  that  these  tests 
should  be  used  is  supplant  or  supplement  the  Loaders5  Self  Description 
Blank,  which  is  currently  available  for  assessing  leadership  potential 
of  enlisted  personnel-  The  reason  for  this  statement  is  that  neither  of 
the  two  new  teste  correlates  better  with  the  criterion  than  the  leaders* 
Self -Deflcri-n tl on  Blank,,  nor  is  the  multiple  correlation  of  the  three 
tests  appreciably  higher  than  the  validity  of  the  Leaders8  Self -Description 
Blank.  However.  It  should  be  noted  that  ths  failure  of  the  new  teste  to 
add  Bppreclably  to  the  validity  of  the  Leaders  * Self -De  scrip  M.  on  Blank 
is  not  a function  vainly  of  the  degree  of  overlap  among  the  measures^ 
rather,  the  failure  is  primarily  at  til  but  able  to  the  relatively  lew  (even 
though  significant)  validity  manifested  by  the  new  tests.  It  would  thus 
see®  that  revision  of  these  tests  so  as  to  augment  their  predictive  value 
could  well  result  in  useful  additions  to  current  selection  instruments. 


There  are  therefore  two  dimensions  along  which  revision  should 
proceed!!  (1)  broadening  of  the  range  of  personnel  to  which  the  tests 
are  applicable,  and  (2)  intensifying  the  discrimination  power  of  the 
items  within  thii;  raage.  Conceivably,  of  course,  the  two  objectives  may 
not  be  attainable  with  a single  form  of  each  test,  so  that  a different 
form  may  be  required  for  each  of  several  types  of  personnel,  e.g.. 
commissioned  and  enlisted. 


Examination  of  the  characteristics  of  the  various  Items  serves  as 
a source  of  hypotheses  for  effecting  these  improvements.  For  example ,, 
extending  the  effective  rang®  of  these  beets  to  reach  mors  highly 
educated  and  experienced  personnel  seems  to  require  modification  in 
two  rasper. tag  (l)  the  situations  depicted  should  be  more  appropriate 
to  t<)w  interests  and  vital  exp^risneeu  of  ouch  pei’Bonn&ls  at  present, 
most  of  the  situations  appear  to  be  rather  simple,  socially  and  motive- 
tdoneily.  making  it  difficult  to  elicit  real  identification  and  projec- 
tion on  the  part  of  novo  cepMsticAtod  oubjectBj  (8)  In  spite  of  the 
projective  approach;  there  w&y  etiil  ua  too  much  transparency  in  regard 
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for  this  work. 


Among  these  cIuqb  are  the  following! 

A,  Tor  Picture  Interpretation  Teat,  items  depicting  high  or  low 
prestige  activities  aeem  more  frequently  to  be  valid.  The  same  Is 
true  of  items  portraying  leadership  or  dominance. 

B»  For  Picture  Fill-In  Taet.  the  discriminating  responses  seem 
to  he  those  which  reflect  of  extra-puni tiveness  and  intra-puxd tivoness . 
Social  appropriateness  or  inter-personal  skill  seems  to  he  another 
dimension  along  which  a number  of  the  items  discriminate. 

The  elimination  of  the  many  non-discriminating  Items  from  the 
present  forms  of  both  of  those  tests,  and  their  replacement  by  items 
constructed  along  the  lines  suggested  above  may  well  be  productive 
of  greater  validity. 

Most  of  the  facts  needed  to  effect  modifications  along  lines 
suggested  above  already  are  at  hand.  For  example,  the  types  of  items 
which  seem  most  productive  of  validity  could  readily  be  classified 
beyond  the  point  already  accomplished.  Also,  as  regards  the  forced- 
choice  possibility,  preference  values  (defined  in  terms  of  frequency 
of  response  seise ti on)  and  validity  coefficients  are  available  for 
all  items. 


fhe  basic  decision  that  must  first  be  made  is  whether  or  not  to 
proceed  further  with  instruments  of  this  type.  It  is  felt  that  the 
evidence  disclosed  in  this  investigation  is  that  the  Army  possesses 
at  least  two  new- type  seats  which  show  appreciable  validity  for 
leadership  criteria,  while  being9  at  the  same  time,  relatively  inde- 
pendent. of  existing  instruments  used  for  leadership  assessment.  These 
considerations  cogently  denote  the  promise  inherent  In  further  explora- 
tion  end  development  along  these  lines . This  is  particularly  true  in 
vlsr  of  the  critic*?!  need  for  techniques  of  leadership  assessment  sad 
the  penalty  of  existing  means  for  meeting  this  need. 
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ABSTRACT 


U,  $*  Dept,  Army.  The  Adjutant  General*  a Office.  Personnel 
Research  Branch.  Validation  of  three  objectively  scored 

ptctorirl  teats  of  personality  for  the  nsaenament  of  leader- 
ship. Personnel  Research  Branch  Report  958,  31  Key  1952. 

39  pp.  Washington!  American  Documentation  Institute  c/o 
Library  of  Congress,  Document  Hq.39YS~  . microfilm , %£-Sb\ 
photocopy,  & .—3  multiple-choice  proj active  teste  were 

adjoin  is  ter  ed  to  erch  of  3 Army  samples*  privates  end  ncmcott- 
missioned  officers  in  Leaders*  Course,  and  West  Point  cadets. 

The  Aoafr  P.lojaarA.&bory  Teat,  w 3 patterned  after  the  Thematic 
Apperception  Test}  the  Pic. U are.  KilbJa.  Teet  was  patterned 
after  the  Picture  Frustration  Test;  the  Picture  T n ter pruta tlaa 
Zs-im.  involved  selective  identification  v.i  h individuals 
depicted  in  various  roles  and  rctivi-ies.  Item  analyses 
(against  leadership  ratings  made  by  ao3oeistes)  failed  to  revecl 
better  than  chrnce  t is trib’ lions  of  item  validities  for  the 
samples  of  cadets  and  noncommissioned  officers.  For  the  privates, 
it  woe  possible  to  develop  stable  3corin;;  keys  for  the  last  2 
oe.ts  named,  Cross-vsliua  ■ -ion  : roduced  validity  coefficients  of 
.19  f->.  he  nd  .25  fox  l he  til nJttumuI nieepceta- 
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TEE  EFFECTIVENESS  OF  PICTORIAL  TESTS  OF  PERSONALITY 
IN  THE  ASSESSMENT  OF  LEADERSHIP 
(Baaed  on  PBS  Report  958) 


STATEMENT  OF  TBS  PROBLEM 

One  of  the  most  important  problems  of  personnel  management  in  the 
Army  is  identifying  (1)  men  with  qualities  of  leadership,  and  (2)  men 
who  can  readily  be  trained  aa  officers  and  noncommissioned  leaders. 
Various  tests  and  procedures  hare  been  used;  some  have  been  more  success- 
ful than  others.  The  purpose  of  the  present  study  was  to  evaluate  three 
new  pictorial  tests  of  personality  as  predictors  of  leadership  ability 
at  the  0.  S.  Military  Academy  and  in  Leader's  Schools. 

RESULTS 

1.  For  a sample  of  privates  enrolled  at  Leader's  Schools,  two  out 
of  three  of  the  tests  gave  a better  than  chance  differentiation  between 
men  rated  high  by  their  associates  and  men  rated  low. 

2.  The  power  of  these  tests  to  distinguish  between  high  rated 
privates  and  low  rated  privates  is  about  the  same  as  a test  already 
in  uses  “the  Leaders  Self -Description  Blank."  The  new  tests  are  not 
closely  related  to  the  old  one. 

3.  However,  for  noncommissioned  officers  at  Leader's  Schools,  no 
one  of  the  teats  differentiated  between  high  rated  men  and  low  rated 
men. 


4.  None  of  the  tests  gave  scores  related  to  Aptitude-for-Service 
Ratings  for  cadets  at  the  Military  Academy. 

CONCLUSIONS 

1*  The  validity  of  pictorial  tests  used  in  this  experiment  was 
insufficient  to  add  significantly  to  the  validity  attainable  with 
Self -Description  Blanks  previously  developed  by  the  Personnel  Research 
Section,  Personnel  Research  and  Procedures  Branch,  The  Adjutant  General's 
Office. 

2.  In  order  for  pictorial  tests  to  become  effective  leadership 
predictors,  it  appears  necessary  to  effect  improvement  in  item. content 
and  format.  Whether  such  improvement  would  be  sufficient  to  warrant 
the  cost  is  debatable. 

WQHK  SUMMARY 

Three  new  tests,  picture  Interpretation  Test,  Army  Picture  Story 
Test,  and  the  Picture  Fill-in  Test,  were  administered  to  216  cadets  at 
the  Military  Academy  and  968  enlisted  men  in  Leader's  Schools.  In 
addition,  the  West  Point  Personal  Inventory  was  administered  to  the 
cadets,,  and  the  Leaders  Self- -Description.  Blank  to  the  enlisted  men. 


Responses  to  test  items  and  total  test  scores  were  compared  with  an 
independent  measure  of  leadership,  the  Aptitu&e-for-Service  Rating  for 
cadets  or  the  associate  rating  for  enlisted  men  and  verified  on 
additional  groups  of  238  cadets  and  368  enlisted  men. 
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PREFACE 


This  is  a report  of  a research  study  initiated  in  May,  1950,  and 
performed  under  Contract  Humber  DA-49-083  OSA-64,  negotiated  under 
authority  of  Section  2 (c)  (5)0  Act  of  February  19,  1948  (Public  Lear  413 
80th  Congress).. 
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of  the  Personnel  Research  Section,  The  Adjutant  General1  s Office, 
Department  of  the  Army,,  Particularly  noteworthy  were  the  contributions 
made  byDrs.  Do  So  Baler,  E.  So  Brogden,  R,  Ferloff,  S.  X.  Taylor,  and 
the  late  Dr.  C.  I.  Mosier. 

Chi  the  staff  of  the  contractor,  indispensable  collaboration  was 
furnished  by  a number  of  individuals  in  addition  to  those  whose  names 
appear,  somewhat  arbitrarily,  in  authorship.  Among  them  are 
Dr.  Ernst  G.  Beier,  Mr.  D.  K.  Hable,  Miss  Mildred  1.  Leonard, 

Mr.  Holland  Tougas,  and  Mrs.  Elizabeth  R.  Coleman  and  her  scoring 
staff. 


The  cooperation  of  the  authorities  at  Forts  Belvoir,  Dix, 

Jackson,  and  Knox,  and  at  the  U.  S.  Military  Academy,  notably 

Lt.  Col.  Raymond  Rumpf , Maj.  Herman  F.  Smith,  and  Dr.  Douglas  Spencer, 

was  invaluable  in  the  acquisition  of  data. 

To  these  individuals,  and  to  many  others  of  whom  space  or 
memory  preclude  the  mention,  the  authors  escpress  their  gratitude. 
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SUMMARY 


A.  Problem 

1.  To  Identify  those  i terns » in  each  of  three  new  objectively  scored 
projective  tests*  which  discriminate  between  superior  and  inferior 
leaders  among  Vest  Point  cadets  and  enlisted  trainees  in  Leaders  Schools. 

2.  To  determine  the  validity  and  reliability  of  the  resulting 
scoring  keys  for  the  assessment  of  leadership  in  new  samples  of  personnel. 

3.  To  compare  the  validity  of  these  keys  with  that  of  a biograph- 
ical inventory  currently  used  by  the  Army  for  leadership  assessment. 

4.  To  factor  analyze  the  several  tests  found  valid  with  West 
Point  cadets  along  with  other  leadership  measures*  in  order  to  inves- 
tigate basic  personality  factors  intrinsic  to  such  measures. 

B.  Method 

1.  The  Tests  — 

a.  Picture  Interpretation  Test  - involves  elective  identi- 
fication with  individuals  depicted  in  various  roles  and  activities. 

b„  Army  Picture  Story  Test  - involves  the  ranking  of  state- 
ments  with  regard  to  their  appropriateness  in  describing  each  of 
a series  of  pictures. 

c.  Picture  Pill~In  Test  - entails  the  rating  of  appropriate- 
ness of  rejoinders  in  conversational  situations  depicted  in 
cartoons. 

* d.  Vest  Point  Personal  Inventory  - a series  of  biographical 
and  self- descriptive  questions „ used  with  West  Point  cadets, 

e.  Leaders  Self- Description  Blank  ••  a series  of  biographical 
and  self-descriptive  questions*  used  with  Leaders  School  trainees. 

2.  The  Criteria  — 


a.  The  West  Point  Aptitude  Rating  was  used  as  the  measure  of 
leadership  performance  of  West  Point  cadets.  This  is  a composite 
rating  on  leadership  made  by  the  cadet 8 s peers  and  tactical  officer. 

b.  The  Associate  Rating0  mai  nly  a nomination  rating  by  peers, 
was  employed  aB  the  standard  of  leadership  performance  of  Leaders 
School  trainees. 
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3.  The  three  projective  teste  were  administered  to  454  Vest 
Point  cadets  and  958  leaders  School  trainees.  The  Vest  Point  Personal 
Inventory  was  also  administered  to  these  cadets.  Criterion  data  were 
obtained  for  as  many  of  these  individuals  as  feasible. 

4.  Ml  items  on  the  three  projective  tests  were  bi serially  corre- 

lated with  the  criterion.  This  was  done  for  four  groups  of  subjects, 
as  follows:  two  randomly  selected  groups  of  cadets*  numbering  213  and 

223,  respectively;  385  privates  enrolled  in  leaders  Schools;  228  non- 
commissioned officers  enrolled  at  leaders  Schools. 

5.  Scoring  keys  were  developed  from  this  analysis,  those  items 
being  keyed  which  had  criterion  correlations  minimally  significant  at 
the  1056  level  of  confidence. 

6.  The  Picture  Pill-In  and  Picture  Interpretation  Tests,  and  the 
leaders  Self-Description  Blank,  were  administered  to  a new  sample  of 
296  privates  enrolled  at  leaders  Schools.  Criterion  data  were  secured 
for  these  individuals.  Validity  and  reliability  statistics  were 
computed  for  this  group. 

C . Eesults  — 

1.  In  the  two  samples  of  Vest  Point  cadets,  there  was  no  better 
than  a chance  relationship  between  responses  to  the  items  on  all  three 
projective  tests  and  the  Aptitude  Eatings  received  by  the  cadets.  The 
West  Foint  Personal  Inventory  had  a correlation  of  .35  with  the  cri- 
terion in  the  two  samples  combined. 

2.  Similar  negative  results  were  obtained  in  the  item  analysis 
of  the  tests  against-  Associate  Eatings  of  non-commissioned  officers 
enrolled  in  leaders  Schools. 

3.  In  a sample  of  privates  enrolled  at  leaders  Schools,  it  was 
possible  to  identify  in  two  of  the  three  projective  testa  an  appreciably 
larger- than-  chance  number  of  items  that  distinguished  between  the  higher- 
and  lower-rated  men.  These  two  tests  were  the  Picture  Interpretation  and 
Picture  Pill-In  Tests. 

4.  These  tests,  when  scored  for  the  new  sample  of  privates  hr  the 
scoring  key  developed  on  the  first  group,  yielded  validity  coefficients 
of  .25  for1  the  Picture  Interpretation  Test,  and  .19  for  the  Picture 
Fill-In  Test.  The  leaders  Self-Description  Blank  had  a validity  coeffi- 
cient of  .30  in  this  sample.  Each  of  these  coefficients  differs  signi- 
ficantly from  zero  at  the  level  of  confidence. 

6.  The  split-half  reliability  coefficients,  augmented  by  the 
Spearman-Brown  Prophecy  formula,  were  . 85  for  the  Picture  Interpretation 
Teat  and  .91  for  the  Picture  Fill-In  Test  in  the  cross-validation  sample. 

6,  The  correlations  of  the  two  tents  with  each  other  and  with  the 
Leaders  Self-Descsription  Elank  were  all  low  and  positive. 
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Do  Conclusions 

1.  The  three  projective  testap  as  now  constituted,  are  of  no  value 
for  leadership  assessment  of  Vest  Point  cadets. 

2.  These  tests  are  also  of  no  value  for  leadership  assessment 
of  non-commissioned  officers  in  Leaders  Schools. 

2.  Scoring  keys  were  developed  for  both  the  Picture  fill-in 
and  Picture  Interpretation  Tests  on  a sample  of  privates  in  Leaders 
Schools.  Scoring  the  two  tests  for  a new  sample  of  privates  by  means 
of  these  keys  yielded  scores  which  were  significantly  correlated  with 
the  criterion  of  leadership  in  Leaders  Schools. 

4.  The  biographical  inventories  (Vest  Point  Personal  Inventory 
and  Leaders  Self-Description  Blank)  showed  significant  criterion  corre- 
lations in  their  respective  samples. 

5.  Among  the  privates,  the  two  valid  projective  tests  did  not 
add  appreciably  to  the  predictive  power  of  the  biographical  inventory 
when  combined  with  it  in  a multiple  regression  equation.  Nonetheless, 
their  correlations  with  the  inventory  are  low  (about  .35),  as  is  their 
correlation  with  one  another  (.18). 

6.  It  is  inferred  that  the  Picture  Pill-In  Test  and  the  Picture 
Interpretation  Test  show  considerable  promise  as  techniques  for  leader- 
ship assessment,  although  improvements  are  needed  to  translate  this 
promise  into  a state  of  practical  utility.  Suggestions  are  made  mani- 
fest in  this  study  as  to  how  improvements  may  be  effected  in  regard  tog 
(1)  power  to  discriminate  more  accurately  between  superior  and  inferior 
leaders,  and  (2)  extending  the  range  of  personnel  with  whom  such  tests 
would  be  useful. 

7.  In  view  of  the  lack  of  validity  of  the  projective  tests  among 
West  Point  cadetSs  it  was  not  meaningful  to  proceed  with  the  factor 
analysis  designed  to  reveal  the  basic  personality  factors  common  to 
these  and  other  measures  of  leadership,  so  that  this  objective  of  the 
study  could  not  be  achieved. 
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She  identification  of  men  with  high  potentialities  as  leaders  it 
understandably  a matter  of  prime  importance  to  the  Army.  Accordingly, 
a considerable  amount  of  research  has  been  done  or  sponsored  by  the 
Army  on  techniques  for  accomplishing  such  identification. 

Although  it  is  commonly  believed  that  non-intellective  factors  are 
of  major  importance  in  determining  a man’s  leadership  performance, 
methods  for  measuring  such  factors  still  leave  much  to  be  desired  In 
the  vay  of  validity  and  accuracy.  In  recent  years,  the  evidence  has 
grown  more  suggestive  that  projective  tests1  may  have  promise  along 
these  lines.  However,  these  tests  are  typically  time  consuming  to 
administer  and  score,  and  typically  require  trained  psychologists  for 
their  interpretation.  These  characteristics  are  manifestly  unsuited 
for  large-scale  military  classification  purposes. 

To  circumvent  these  deficiencies,  The  Personnel  Research  Section 
of  the  Adjutant  General’s  Office  undertook  the  preparation  of  several 
tests  which  are  fundamentally  projective  in  nature  but  which  are  amen- 
able to  group  administration  and  objective  (even  machine)  scoring.  When 
any  new  test  is  constructed,  the  questions  of  its  validity  and  what  it 
measures  immediately  arise.  These  questions  become  even  more  urgent 
when  the  test  represents  a radically  new  departure.  Thus,  In  the  case 
of  the  new  objective  projective  tests,  not  only  are  their  particular 
validities  unknown,  but  also  subject  to  question  are  the  issues  of  the 
general  fruitfulness  of  the  approach  and  of  the  underlying  psychological 
dimensions  measured  by  such  techniques. 

The  research  described  in  this  report  was  undertaken  in  an  effort 
to  shed,  light  on  these  questions. 

II.  OBJSOTim 

More  specifically,  the  objectives  of  this  research  may  be 
described  as  follows? 

A.  To  ascertain  the  validity  of  each  of  three  objective  projective 
tests  for  measuring  leadership  performance  of  Army  commissioned  personnel. 

1.  To  determine  the  correlation  of  each  item  with  a 
criterion  of  leadership  performance,  on  the  basis  of  which  to 
develop  a scoring  key  for  each  test. 


*1 

A projective  test  requires  the  examinee  to  interpret  or  structure  a 
stimulus  situation  which  lends  itself  to  a variety  of  meanings,  and 
thereby  to  reveal  aspects  of  his  personality. 
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2t  To  ascertain  for  each  of  these  scoring  keys  i ts  relia- 
bility and  validity  against  a criterion  of  leadership  performance 
at  the  level  of  commissioned  personnel. 

3.  To  compare  the  relative  validities  of  these  tests  with 
one  another*  and  with  a self -description  questionnaires 

. - „ 7 * ‘ -f  » 

B.  To  ascertain  the  validity  of  each  of  three  objective  projective 
tests  for  measuring  leadership  performance  of  Army  non-commissioned 
personnel. 


1.  To  determine  the  correlation  of  each  item  with  a criterion 
of  leadership  performance*  on  the  basis  of  which  to  develop  a 
scoring  key  for  each  test. 

2.  To  ascertain  for  each  of  these  scoring  keys  its  relia- 
bility and  validity  against  a criterion  of  leadership  performance 
at  the  level  of  non-commissioned  personnel. 

3.  To  compare  the  relative  validities  of  these  tests  with 
one  another9  and  with  a self-description  questionnaire. 

« 

C.  Prom  these  data,  to  infer  the  general  promise  of  this  type 
of  test,  and  to  deduce  indications  of  which  lines  of  future  develop- 
ment seem  most-  fruitful. 

It  was  also  hoped  originally  to  factor  analyze  the  relationships 
among  these  tests,  together  with  other  personality  measures  includ- 
ing ratings  and  behavior  measures,  with  the  objective  of  determining 
basic  personality  factors  gauged  by  such  variables.  Since  the  non- 
test variables  were  more  appropriate  and  ayailable  in  the  commissioned 
personnel  situation  (West  Point) , the  plan  was  to  perform  this 
analysis  in  connection  with  the  data  obtained  from  that  sample. 

However,  it  was  discovered  in  the  course  of  the  research  that  the 
projective  tests  were  virtually  uncorrelated  with  the  leadership 
criterion  in  this  situation,  thuB  making  the  planned  analysis 
pointless. 
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1X1.  METHOD 


The  general  plan  of  the  study  consisted,  of  administering  the 
tests  to  samples  of  personnel  who  were  representative  of  the  two  levels 
of  leadership  activities  for  which  Buc'n  teats  wight  he  valuable  assess- 
ment techniques.  An  additional  requirement  for  selecting  the  staples 
was  that  the  personnel  he  assigned  to  situations  in  which  criteria  of 
‘ leadership  performance  would  he  available. 

In  accordance  with  these  standards,  cadets  in  the  upper  classes 
of  the  United  States  Military  Academy  at  Vest  Point  were  chosen  as 
the  sample  whose  characteristics  and  activities  were  approximately 
representative  of  personnel  to  he  assessed  for  potential  leadership 
•at  the  commissioned  level. 

Students  at  Leadership  Schools  were  selected  as  sui table  for 
representing  potential  leaders  at  the  non-commissioned  level.  This 
group  is  actually  composed  of  two  subgroups,  as  regards  age,  back- 
ground, and  previous  experiences  privates  and  non-commissioned 
officers.  It  was  deemed  advisable  to  investigate  separately  the 
validity  of  the  teBts  for  each  of  the  two  subgroups. 

Thus,  there  were  three  categories  of  personnel  who  were  the 
subjects  of  the  investigation;  WeBt  Point  cadets,  privates  assigned 
to  Leadership  Schools,  and  non-commissioned  officers  assigned  to 
Leadership  Schools. 

The  research  design  involved  the  following  steps  for  each  of  the 
categories  of  personnel; 

1.  Administering  the  three  objective  projective  tests  to  samples 
of  the  personnel, 

2.  Collection  of  criterion  data  for  these  individuals. 

3.  Correlation  of  the  test  items  against  the  criterion. 

4„  Development  of  a scoring  key  for  each  test.’ 

5,  Application  of  the  key  to  the  test  results  of  new  samples 
of  personnel.. 

6.,  Correlation  of  the  scores  on  each  test  with  the  criterion 
of  leadership  performance. 

In  the  remainder  of  this  chapter,  the  tests  will  first  be  des- 
cribed, .followed  by  a description,  for  the  cadet  officers,  of  the  sam- 
ples, criterion,,  procedure  for  collecting  data...  and  methods  of  analyzing 
the  data..  Finally,  the  earne  rubrics  of  t uloimntion  will  be  presented 
lor  the  enilstfjii  imwi 


A.  Teats  (Copies  of  the  tests  are  included  in  the  Appendix  of 
this  report . 

The  following  tests  were  included  in  the  validation  study} 

1.  Picture  Interpretation  Test.  1949.  (PA  AGO  PBT  - 1775) 

This  432  item  test  consists  of  a series  of  268  pictures,  some  of  which 
present  individuals’  participating  in  military  activities  and  others 
involving  individuals  in  civilian  situations.  The  general  directions 
indicate  that  the  test  is' a measure  of  interests,  although  it  may  he 
considered  a projective  instrument  to  the  extent  that  the  examinee 
tends  to  identify  With  the  situations  and  individuals  illustrated  in 
the  pictures. 

Instructions  for  the  first  bIx  parts  of  the  test  follow  the  same 
general  pattern.  For  the  individuals  or  situations  presented  in  the 
pictures  in  each  part  of  the  test9  the  examinee  is  required  to  choose 
between  two  alternative  reactions „ as  follows; 

(1)  Part  I 

(a)  "Yes.  I would  like  to  do  what  he  is  doing,"  or 

(b)  "go,  I would  not  like  to  do  what  he  is  doing." 

(2)  Part  II 

(a)  "Yes.  I would  like  to  be  that  person, * or 

(b)  "go,  I would  not  like  to  be  that  person." 

(3)  Part  III 

(a)  "Yes,  this  person  is  like  me,"  or 

(b)  "go,  this  person  is  not  like  me." 

(4)  Part  IV 

(a)  "leg.,  I would  admire  this  person,"  or 

(b)  "1ft,  I would  not  admire  this  person." 

f ' *•  ***■ 

KO)  reu  b * 

(a)  "Yes.  I am  good  at  doing  what  this  person  ie  doing,"  or 

(b)  "go,  I am  not  good  at  doing  what  this  person  is  doing." 

(6)  Part  VI 

(a)  "Yob.  I like  what  is  shown  in  this  picture,"  or 

(b)  "go,  I do  not  like  what  is  shown  in  this  picture." 

Part  VII  differs  from  the  rest  of  the  test  in  that  pictures  of 
military  situations  and  civilian  situations  are  presented  along  with 
five  descriptive  statements  for  each  picture..  The  examinee  is  required 
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to  make  the  following  choice  in  regard  to  each.  st  at  ament? 

“Yes,-,  the  picture  made  me  think  of  this  idea,*  or 
*£o,  the  picture  did  not  make  me  think  of  this  idea." 

2.  AM.  picture  giory  Test,  Series.  B,  195Q,  SSISSSMS.  University 

Pr«a». 


The  limy  Picture  Story  Test  is  an  objective  test,,  based  on  the  general 
idea  of  the  Thematic  Apperception  Test,  consisting  of  a series  of  ten 
pictures.  The  pictures  included  in  the  Army  Picture  Story  Test  involve 
both  military  and  non-military  situations  and  are  not  the  pictures  used 
in  the  Thematic  Apperception  Test.  For  each  picture,  there  are  thirty 
items  presented  in  groups  of  three.  The  items  are  relatively  short 
statements  which  are  descriptive  of  the  picture.  The  examinee  is 
instructed  to  read  the  statements  within  each  triad  and  to  select  two 
statements;  the  most  descriptive  and  the  least  descriptive. 

The  statements  used  in  this  test  were  obtained  by  administering 
the  set  of  ten  pictures  to  a large  group  of  soldiers  in  a free  response 
situation.  The  descriptions  written  by  this  group  were  edited  and 
arranged  m triads  on  the  basis  of  their  frequency  of  occurrence  and 
with  respect  to  a number  of  clinical  categories.  That  is,  triads  were 
composed  of  items  which  were  approximately  equal  in  frequency  of  occur- 
rence but  which  dealt  with  different  personality  needs. 

3„  Picture  Fill-In  Test,,  Second  Form,  1949  (DA  AGO  PET-1726). 

The  Picture  Fill-  In  Test  is  an  adaptation  of  the  Eosenzweig  Picture- 
Frustration  Test.  It  differs  from  the  SoBenssweig  test  in  that  the 
responses  ate  obtained  tn  objective  form.  A series  of  43  cartoon- 
like  pictures  is  presented,  comprising  a total  of  392  items.  In  each 
picture,  one  individual  is  represented  as  saying  something  to  another 
individual..  Some  of  the  pictures  deal  with  military  situations,  while 
24  pictures  were  taken  directly  from  the  Eosensrweig  test.  In  an 
experimental  administration  of  the  Preliminary  Form  of  the  Picture 
Fill-In  Test,  the  examinees  wrote  responses  in  the  cartoon  balloons. 
Eesponses  made  most  frequently  by  this  experimental  group  were  selected 
for  each  of  the  pictures.  Certain  responses  which  seemed  to  be  parti- 
cularly revealing  or  measuring  important  factors  also  were  included, 
regard! p.RH  of  their  frequency  of  occurrence.  From  seven  to  ten 
responses  were  selected,  and  are  presented  below  each  picture  in  inw 
Second  Form  of  the  test.  This  form,  which  waB  used  in  the  present 
investigation,  was  developed  so  that  it  would  be  suitable  for  objec- 
tive scoring  in  the  following  manner;  The  instructions  require  that 
the  examinee  rate  each  of  the  responses  presented  with  the  pictures 
with  reBpect  to  how  likely  it  is  that  the  person  shown  would  give 
that  response.  Tide  rating  of  each  response  Is  accomplished  ou  the 
following  three -point  scale; 


A.  "Might  say  something  like  this." 

B.  "Is  likely  to  say  something  like  this." 

C.  "Is  very  likely  to  say  something  like  this." 

4.  West  Point  Personal  Inventory,  1949  (DA  AGO  PRT-1756) 

(Alec  referred  to  bb  RQTC  Self -Description  Blank,  Fora  II.  1949, 

DA  AGO  PRT-1744.  and  in  previous  progress  reports  as  Bio arem hloal 
Information  Blank.  ROTO  edition.^ 

The  Vest  Point  Personal  Inventory  used  in  the  present  investigation 
with  cadets  consists  of  four  sections  and  a total  of  420  items.  This 
test  does  not  make  use  of  pictorial  material.  The  items  are  in  the 
form  of  statements  concerning  various  characteristics,,  as  follows; 

Section  I includes  pairs  of  statements  dealing  with  personal  charac- 
teristics; the  individual  is  instructed  to  select  the  statement  in 
each  pair  that  ie  the  Best  description  of  him.  In  Section  II,  the 
individual  mokes  a choice  between  each  of  two  activities  as  to  which 
he  “believes  he  can  do  “better.  Statements  dealing  with  likes  and 
dislikes  are  presented  in  Section  III,  and  the  individual  again 
selects  the  statement  in  each  pair  that  he  likes  the  “better.  Section  17 
contains  statements  describing  personal  characteristics,  likes  and  dis- 
likes, abilities,  and  beliefs.  For  each  statement,  the  individual  indi- 
cates whether  the  statement  applies  to  him  or  does  not  apply. 

5.  Leaders3  Self -Peso  rip  tion  Blank.  Form  gl.0  1951.  Syracuse 
University  press. 

The  Leaders3  Self  -Description  Blank  is  a 342  Item  version  of  the 
Biographical  Information  Blank  and  was  used  at  the  non-commissioned 
level  at  the  Leaders3  Schools  in  the  present  investigation.  It  is 
similar  in  composition  to  the  Vest  Point  Personal  Inventory,  but  the 
exact  content  of  the  items  is  different.  Like  the  Vest  Point  Personal 
Inventory,  it  does  not  present  pictorial  material  and  consists  of  four 
sections. 

Section  I. contains  pairs  of  statements  dealing  with  personal 
characteristics.  The  examinee  chooses  the  statement  from  each  pair 
that  describes  him  better.  The  pairs  of  statements  in  Section  II 
describe  various  activities,  and  the  individual  selects  the  activity 
which  he  can  do  better.  Pairs  of  statements  are  presented  in  Section  III 
dealing  with  likes  and  dislikes,  and  the  instructions  require  select- 
ing the  statement  that  you  like  better.,  personal  characteristics,  likes 
and  dislikes,  abilities,  and  beliefs  make  up  the  content  of  Section  17, 
and  the  examinee  is  Instructed  to  indicate  whether  each  statement 
applies  to  him  or  does  not  apply. 
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B.  Situational  Validity  at  the  Commissioned  Officer  Level 

1.  Sample 

The  first  and  second  classes  of  cadets  at  the  U.  S. 
Military  Academy*  West  Point,  in  July.  1950-.  were  the  subjects  of  the 
investigation.  A total  of  454  cadets  was  tested. 

Available  cases  from  this  sample  were  later  divided  into  two 
random  sub-groups  for  purposes  of  performing  a double  cross-validation 
analysis.  These  subgroups  comprised,  respectively,  213  and  223  cases. 

2.  Tests-*- 

The  test-  employed  with  this  sample  were: 

a.  Picture  Interpretation  Test.  1949.  (DA  AGO  PET-1775 ) 
J&SX  Eig-fem  £i2 ry  Test.  Series  £»  1950.  Syracuse 
IMisagiiaL  ££§£& 

C.  Fill-In  Test,  Second  Form.  1949, 

(DA  AGO  PET-1726) 

d.  West  point  Personal  Inventory.  1949  (DA  AGO  PBT-1756) 

3.  Criterion 

The  Aptitude  for  the  Service  System2  was  ascertained  for 
each  cadet  for  use  ae  a criterion  measure  of  leadership.  The  Aptitude 
for  the  Service  System  is  used  at  West  Point  for  the  purpose  of  pro- 
viding an  accurate  evaluation  of  the  leadership  effectiveness  of  cadets. 
The  Aptitude  Eating  ie  a composite  measure  including  the  pooled  opinion 
of  the  cadet’s  Tactical  Officer  and  a small  group  of  classmates  within 
his  Company.  The  evaluation  by  his  classmates  is  accomplished  through 
an  associate  (buddy)  rating  procedure. 

Each  cadet  is  ranked  in  order  of  merit  by  his  Tactical  Officer 
and  by  the  cadets  in  his  Company  in  regard  to  the  following  definition 
of  leadership: 

"The  criterion  of  my  appraisal  Is  each  cadet’s  ability  (if  or 
vhen  placed  In  command  of  «.  group)  to  elicit  the  group’s  maximum  coop- 
eration; maintain  the  highest  possible  standards  of  administration  and 


" A description  of  the  tests  used  in  the  study  is  presented  in 
Section  III,  A. 

2 A detailed  description  of  the  Aptitude  for  the  Service  System  may  be 
found  in  "The  Operation  and  Administration  of  the  Aptitude  for  the 
Service  System.,  U.S.M.A.  WeBfc  Point,,  Kew  fork:  United  States 
Mill tary  Academy.  19L1. 


-8- 


disoipline;  and  at  the  same  time,  develop  and  preserve  high  morale 
and  group  spirit. 

From  the  raw  ratings,  the  median  ranking  for  each  cadet  is 
determined  and  transposed  to  a standard  score  (called  Arm y Standard 
Bating).  The  Tactical  Officers  rating  is  assigned  a weight  of  one- 
third  in  combining  it  with  the  associate  ratings.  It  is  this  final 
or  composite  Army  Standard  Bating  (Aptitude  Bating)  that  constituted 
the  criterion  of  leadership  in  this  investigation. 

4.  Procedure 

a.  Test  Administration 

On  June  30  and  July  1,  1950#  the  four  tests  were  admin- 
istered in  group  situations  to  216  cadets  of  the  first  and  second 
classes  at  the  U.  S.  Military  Academy,  Vest  Point.  The  test  battery 
was  divided  into  two  sessions*  two  of  the  tests  being  administered 
in  the  first  session  and  the  other  two  tests  being  given  in  the 
second  session.  Each  session  required  about  three  hours  of  testing 
time.  A similar  procedure  was  utilized  when  the  second  group  of  Vest 
Point  cadets  was  tested  on  July  27,  28  and  29,  1950.  This  second 
group  of  cadets  numbered  238  and  were  from  the  first  and  second 
classes. 


b.  Collection  of  criterion  data  and  constitution  of 
criterion  groups. 

Criterion  data,  entered  on  Hollerith  cards,  were 
received  from  the  Vest  Point  statistical  office.  These  cards  con- 
tained the  cadet  serial  number,  the  mean  Aptitude  Eating  based  on  the 
first  term  and  the  second  term  of  the  second  class,  Aptitude  Eatings 
for  both  terms,  and  year  of  expected  graduation.  The  criterion  of 
leadership  effectiveness  utilized  in  this  investigation  with  the  West 
Point  sample  wa3  the  mean  Aptitude  Eating  which  summarizes  the  cadet* s 
leadership  performance  during  his  second  class. 

The  total  sample  was  divided  randomly  on  the  basis  of  serial, 
numbers,  group  A being  composed  of  those  cadets  with  even  serial  num- 
bers, -aud  group  B having  odd  Rerial  numbers.  Ab  a check  on  the 
randomness  of  this  procedure,  .£  and  F statistics  wera  computed  between 
the  mean  Aptitude  Index  criterion  scores  of  the  two  groups;  this 
analysis  indicated  that  the  two  groups  may  be  considered  as  random 
samples  from  the  same  population  in  regard  to  the  leadership  criterion. 
The  purpose  of  fraction! zing  the  sample  in  this  manner  was  to  make  it 
possible  to  perform  a double  ctobb- -validation  on  the  scoring  keys  de- 
rived In  the  item  amolyBes. 


ibid,  o >•, 
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5.  Analysis  of  Data 
a.  I tew  Analysis 

The  validity  of  the  items  in  the  experimental  tests  was 
estimated  by  computing  the  hi  serial  correlation  coefficient  "between 
dichotomized  item  responses  and  the  Aptitude  Hating  criterion.  The 
computation  of  the  item  validities  vas  facilitated  by  making  use  of 
the  Kolbe  and  Edgerton  table  for  estimating  biserial  correlation 
coefficients.-*- 

The  Aptitude  Eating  criterion  vas  normalized  by  dividing  the 
distribution  into  equal  frequency  eighths,  and  assigning  the  standard 
score  eauivalent  of  the  mid-point  of  each  eighth  in  a normal  distribu- 
tion to  each  criterion  score  within  that  eighth.  Thus,  all  cases 
falling  in  a given  eighth  of  the  obtained  distribution  of  criterion 
scores  received  the  same  standard  score  equivalent. 

While  the  same  general  item  analysis  procedure  vas  followed  for 
the  West  Point  study,  somewhat  different  techniques  of  dichotomizing  the 
item  responses  were  necessary  for  the  different  tests,  as  follows* 

(1)  Picture  Interpretation  Test.  1949  (DA  AGO  PET-  1775) 

The  item  responses  in  this  test  fit  a natural  dichotomy  since  the 
examinee  ip  instructed  to  indicate  either  ’’Yes*  or  "Ho"  for  each  item. 
Thus,  there  is  no  problem  in  dichotomizing  the  responses  for  the  pur- 
poses of  the  hi serial  correlation  type  of  item  analysis . 

(2)  Ancy  Pic ture  Story  Test.  SerieB  B„  I960.  Syracuse 
University  Press,  As  described  in  Section  III,  A,  Tests,  the  Army 
Picture  Story  Test  requires  that  the  individual  choose  the  most  des- 
criptive and  the  least  descriptive  statements  from  groups  of  three 
items.  Within  the  triad,  the  item  that  is  considered  to  'be  most 
descriptive  iB  marked  A.  while  the  item  that  seems  to  be  least  des- 
criptive is  marked  B,  and  the  intermediate  item  is  not  marked.  Por 
purposes  of  obtaining  item  frequencies,  I.B.M.  graphic  item  counts 
were  made  for  each  item;  for  the  A or  B alternatives..  The  triehoto- 
rnous  alternatives  for  each  item  were  dichotomized  in  order  to  apply 
the  biserial  correlation  item  analysis  technique;  in  doing  this,  the 
extreme  alternative  (’’Best*  or  ^Woxst")  having  the  larger  frequency 
of  response  was  used  as  one  category  oi  the  dichotomy,  while  the 
combination  of  the  other  extreme  with  the  intermediate  alternative 
constituted  the  other  category.  This  arrangement  was  used  in  order 
to  yield  the  closest  approximation  to  a bQ$>  50$  dichotomy,  thus 
maximizing  tne  stability  of  the  resulting  item  validity  coefficients. 


hoi  re  L.  £ .. , •-■x>q  Etigerton  JL  A.,  ’’A  Table  for  Computing  Hi  serial 
Xh,  J.  E*o.  Eaoc . , 1 4 . 24b  2*il 


-10- 


(3)  Picture  &n=Ia  asat.  Second  Pong.  1949  (DAAfiS, 

PRT-1736) . This  test  requires  that  the  individual  rate  each  item 
on  a three-point  scale  in  regard  to  the  degree  of  likelihood  that 
the  item  is  an  appropriate  statement,  as  explained  in  Section  111,  S, 
Tests.  The  dichotomy  required  hy  hiserial  item  analysis  was  achieved 
hy  combining  the  B and  0 responses.  Thus,  the  frequencies  of  responses 
were  obtained  for  the  A category  vs.  the  combined  £ and  0 category. 

The  basis  for  grouping  the  B and  G responses  rather  than  utilising 
some  other  combination  in  order  to  dichotomize  the  responses  was  both 
logical  and  empirical.  On  a priori  grounds,  it  seemed  more  reasonable 
to  believe  that  B (Is  likely  to  say  something  like  this)  is  closer  on 
a continuum  to  C (Is  very  likely  to  say  something  like  this).  More- 
over, an  inspection  of  the  item  responses  indicated  that  by  using  the 
dichotomy  of  A vs.  B and  0,  the  ideal  50$-50$  dichotomy  was  more 
closely  approximated. 

(4)  West  Point  Personal  Inventory.  1949  (DA  AGO  PRT-1756) . 

An  item  analysis  of  this  test  was  not  necessary  since  the  scoring  key 
had  already  been  developed  in  a previous  study  and  was  made  available 
by  the  Personnel  Research  Section,  AGO,  for  the  validation  phase  of 
the  present  investigation. 

b.  Pattern  Item  Analysis 

Since  the  items  of  the  Army  Picture  Story  Test  are  grouped 
in  triads,  it  was  hypothesized  that  the  pattern  of  responses  might  be 
significant.  In  order  to  investigate  this  hypothesis,  the  following 
pattern  analysis  was  performed;  For  each  triad,  Bix  patterns  of  res- 
ponses axe  possible.  For  Group  A of  the  WeBt  Point  sample,  frequency 
counts  were  made  of  the  responses  to  each  of  the  six  patterns  for  each 
of  the  100  triads.  A level  of  significance  test  based  on  X**  was  made 
among  the  frequencies  of  the  patterns  for  each  triad,  contrasting 
upper  and  lower  criterion  groups.  This  procedure  made  it  possible  to 
estimate  the  validities  of  the  pattern  responses. 

c.  Cross-Validation 

In  general,  the  validities  of  the  tests  were  estimated  by 
computing  Pearson  product-moment  correlation  coefficients  between 
the  scoring  keys  derived  and  the  -Aptitude  Eating  leadership  criterion. 

In  following  this  procedure,  the  two  samples,  group  A (even  serial 
numbers)  and  group  B (odd  serial  numbers)  were  treated  separately,  in 
order  that  the  scoring  key  derived  on  group  A could  be  crossed  over 
and  validated  on  Group  B,  and  the  scoring  key  obtained  on  group  B could 
be  validated  on  group  A„  This  double  crosB-validation  technique  makes 
use  of  the  principle  of  replication  in  determining  which  items  axe 
consistently  valid  in  both  samples  and  permits  two  minimum  estimates 
of  the  validity  of  the  test. 


^ For  a fuller  discussion  of  the  double  cross-validation  technique, 

see  Katzell*  R.  A„s  "Cross-Validation  of  Item  Analyses",  flduo.  Psychol. 
Measmt . , 1951,.  11,  1G  22.. 
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C.  Enlisted  Personnel  Validity  Studies 

1 c Samples 

a,  ' Item  Analysis  Group 

The  enlisted  personnel  samples  in  this  study  were 
drawn  from  Army  Leader's  Schools  whose  mission  is  to  train 
personnel  for  leadership,  primarily  at  the  non-commissioned 
officer  level.  It  is  anticipated  that  the  hulk  of  trainees  will 
consist  of  privates  who  have  Just  completed  basic  training  and 
who  have  boon  recommended  by  their  company  officers  as  evidencing 
leadership  potential.  This  source  of  students  1b  called  "pipe- 
line". Other  sources  have  included  reenlistments  and  Hational 
Guard  personnel  called  up  for  active  duty  as  a result  of  the 
Korean  conflict.  Por  these  groups,  training  at  the  Leader's 
Schools  is  considered  a refresher  course.  One  other  important 
category  includes  officer  candidates  who,  at  present  are 
required  to  complete  a leadership  course  before  attending  Officer 
Candidate  Schools.  Leaders  Schools  at  Pt.  Six,  H.  J.,  and 
Pt.  Knox,  Ky. , which  train  soldiers  from  ground  force  units,  were 
visited  to  gather  the  data  for  item  analysis.  958  men  were  tested 
at  these' two  installations. 

The  sample  was  divided  into  two  subgroups:  privates  (includ- 

ing privates  first  class)  and  non-commissioned  officers.  These 
groups  differ  in  average  age  and  military  background,  factors 
which  might  affect  performance  on  the  tests  and  criteria.  Hence, 
it  was  considered  desirable  to  perform  separate  item  analyses  and 
validations  for  the  two  subsamples. 

b.  Cross-Validation  Group 

Leaders  Schools  at  Pt.  Jackson,  S.  C . , and  Pt. 
Belvoir,  Va. , were  visited  to  secure  the  data  for  cross-validation. 
These  schools  train  personnel  from  infantry  and  engineering  units, 
respectively.  368  cases  were  utilized  for  the  cross-validation 
results. 

2.  Tests 

a.  Item  Analysis  Group 

At  Pt.  Dix  and  Pt.  Knox,  the  Picture  Interpretation 
Test,  124S  (PA  AGO  PET-1775) , Picture  ^ Jn  T§jJl,  Sgfififid  Eprffi, 

1949.  (PA  AGO  PET- 1726) , and  Army  Picture  Story  Test,  Series  £, 

1950.  Syracuse  University  Press  were  administered. 
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1).  Cro  ss-Validati  on  Group  • 

On  the  'basis  of  the  item  analysis  performed,  it  w&s 
decided  to  administer  only  the  Picture  Interpretation  Test  and  the 
Picture  Fill-In  Test  to  the  cross-validation  sample.  In  addition, 
at  the  request  of  PBS,  the  Leader8 s Self-Descri-ption  Blanks  Series  J2, 
1951.  Syracuse  Unlversl ty  Press . was  administered  to  this  same  group. 

Descriptions  of  the  nature  of  all  the  above  tests  may  he  found 
in  Section  III,  Part  A,  of  this  report. 

3.  Criteria 

a.  Item  Analysis  Croup 

The  training  cycle  at  Leader's  Schools  is  divided  into 
two  tour -week  phases.  Phase  I and  Phase  II.  Training  during  Phase  I 
is  primarily  academic  in  nature,  whereas  Phase  II  consists  primarily 
of  practical  leadership  experience  in  field  situations.  During  the 
training  program,  soldiers  are  periodically  evaluated  for  their  per- 
formance on  different  criteria  by  various  kinds  of  raters,  i.e.,  both 
commissioned  and  non-commissioned  cadre  aB  well  as  by  their  peerB. 

All  soldiers  tested  in  this  study  were  in  Phase  I of  the  training 
cycle  at  the  time  of  testing. 

The  following  criterion  measures  of  leadership  were  obtained 
for  the  group  tested  at  Ft.  Dix  and  Ft.  Knox;  Faculty  Board  Bating, 
Associate  Eating,  Leaders'  Reaction  Test,  Bating  of  Phase  II  Perfor- 
mance, and  Total  Bating  (a  weighted  combination  of  the  foregoing). 

Intercorreletions  among  the  above  criteria  were  computed  for  a 
Bample  of  the  soldiers  tested  at  Ft.  Dix  and  Ft.  Knox.  These 
statistics  are  useful  for  estimating  the  extent  to  which  the  criteria 
measure  different  aspects  of  leadership.  The  following  table  shows 
these  results.  (See  Table  1). 

In  both  samples,  it  Beems  evident  that  the  various  criteria  are 
somewhat  unrelated  to  each  other.  Although  the  Associate  Bating 
also  appears  to  be  somewhat  different  from  other  ratings  of  leader- 
ship potential,  on  the  basis  of  its  recommendation  by  Personnel 
Be  search  Section  for  use  m this  study,  it  would  seem  to  be  the  most- 
appropriate  measure  of  leadership  available.  Eesults  from  other 
Personnel  Eesearch  Section  studies*  had  shown  associate  ratings  to  be 
superior  measures  of  leadership,  in- the  present  study,  furthermore. 
Associate  Eatings  were  available  for  more  subjects  tested  than  any  of 
the  other  ratings. 


* Vherry,  Hobert.  H.  and  Fryer,  Douglas  EL,  "Buddy  Ratings;  Popularity 
Contest  or  Leadership  Criteria?'',  Personnel  Psychology.  1949, 
147-169. 
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Table  1,  Intercorrelations  among  Various  Criteria  for  Two  Unlisted 
Samples . 


A.  Fort  Six 


N a 173 


A*  E« 

Perf.  11 

Total 

Faculty  Board  Eating-' 

— — 

.14 

.45 

.15 

.74 

Leaders  Beaction  Test 

— 

.04 

.03 

.37 

Associate  Bating 



.06 

.44 

Performance  during  Phase  II t 

— — 

.66 

Total 

— 

Correlation  between  Phase  II 

and  sum  of 

other 

three  variables  a 

.20 

B.  Port  Knox 

H s 113-162 

P.B.R. 

L.B.T. 

A.E. 

Perf.  II 

Total 

Facility  Board  Bating 

— — * 

.32 

.15 

.33 

.78 

Leaders  Beaction  Test 

— 

.00 

.38 

.50 

Associate  Eating 

. 

— 

-.12 

.20 

Performance  during  Phase  II 
Total 

.78 

Correlation  between  Phase  II 

and  sum  of 

other 

three  variables  * 

.34 

ill  of  the  above  considerations  led  to  the  decision  to  use  Associate 
Eatings  as  the  criterion  for  the  enlisted  sample  of  this  study. 

Since  this  criterion  was  adopted,  it  will  be  valuable  to  describe 
In  somewhat  greater  detail  the  operations  by  which  scores  on  this  measure 
•were  obtained  for  the  samples.  Each  student  is  evaluated  by  his  fellow 
students  at  the  end  of  Phase  I training.  The  students  are  each  given  a 
"Student  Leadership  Evaluation  Eeport-Eating  Sheet".  On  this  sheet  is 
a roster  of  the  men  in  the  students  group,  customarily  numbering  from 
nine  to  fifteen  men.  The  student,  from  this  roster,  chooses  those  whom 
ioe  thinks  the  three  best  leaders  and  the  three  poorest  leaders.  On  the 
next  day,  each  student  is  given  "Student  Leadership  Evaluation  Beport- 
Tescriptiort  Sheet" = On  thin  sheet  axe  printed  the  names  of  the  men.  in 
tbs  group.  There  are  also  ten  pairs  of  descriptive  statements.  Por 
each,  man  on  the  roster,  the  student  is  to  choose  the  description  in 
each  pair  of  statements  which  most  appropriately  describes  the  man  being 
rated.  These  sheets  are  then  scored  by  using  the  keys  furnished  by 
Personnel  Eesearch  Section.  One  score  is  based  on  the  nominating  tech- 
nique, weights  being  given  to  the  number  of  nominations  received.  The 
more  nominations  a soldier  receives  which  are  indicative  of  better 
leadership,  the  higher  his  Associate  Eating  score.  The  other  score  is 
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derived  from  weights  "based  on  empirical  evidence  as  to  which  des- 
criptions are  more  characteristic  of  "better  leaders.  The  scores,  from 
the  rating  sheets  and  description  sheets  are  averaged.  These  scores 
constituted  the  Associate  Bating  criterion  used  for  item  analysis 
purposes. 

b.  Cross-Validation  Croup 

Associate  Eatings  were  also  obtained  for  the  soldiers 
tested  at  Ft.  Jackson  and  Ft.  Belvoir.  The  men  were  rated  in  the 
period  from  about  January  through  March,  1962.  Test  scores  were 
obtained  by  use  of  the  keys  developed  from  the  item  analysis  group. 
These  scores  were  correlated  with  the  Associate  Eatings. 

4.  Procedure 

a.  Item  Analysis  Group 

Trainees  in  Phase  I at  Ft.  Mac  were  tested  in  August 
and  October  of  1950  and  January  1951.  The  Picture  Interpretation, 
Picture  FilHa,  and  Army  Picture  Story  TestB  were  administered  to 
groups  ranging  in  size  from  approx; lately  50  to  100  trainees.  Every 
effort  was  made  to  elicit  the  cooperation  of  the  soldiers  tested, 
including  some  explanation  of  the  purpose  of  the  Btudy.  A total  of 
480  subjects  in  Phase  I was  tested  at  Ft.  Mx.  In  field  trips  made 
during  October  1950  end  January  1961,  478  subjects  in  Phase  I were 
tested  at  Ft.  Knox.  Thus,  the  total  number  of  trainees  tested  for 
the  item  analysis  group  was  958. 

b.  Cross-Validation  Group 

Trainees  in  Phase  I at  Ft.  Jackson,  S.  C„,  and 
Ft.  Belvoir,  Va, , were  tested  in  January,  1952,  under  conditions 
similar  to  those  obtaining  for  the  item  analysis  group.  On  the 
baslB  of  the  results  of  the  item  analysis,  it  was  decided  to  admin- 
ister only  the  Picture  Interpretation  and  Picture  Fill-In  tests  to 
these  groups.  An  additional  test  was  administered  at  the  request  of 
Personnel  Besearch  Section,  the  Leader1 s Self-Description  Blank.  At 
Ft.  Jackson,  the  number  of  privates  whose  test  pepers  were  adequately 
filled  out  was  166°  non-commissioned  officers  numbered  50.  At 
Ft--  "Relvoir.  test  papers  from  143  privates  were  acceptable;  the  non- 
commissioned officer  sample  numbered.  9.  The  total  number  for  all 
three  tests  included  309  privates  and  59  non-commissioned  officers. 

Table  2 summarizes  the  number  of  cases  used. 


fable  2.  Number  of  Gases  In  Item-Analysis  and  Cross-Validation  Samples. 
A.  Number  of  Cases  in  the  Item  Analysis  Croup 


Vt.  Dix 

Ft.  Knox 

Total 

Privates 

246 

139 

385 

Non-coms 

90 

138 

238 

B. 

Number  of  Cases  in 

the  Cross-Validation  Croup 

Ft,  Jackson 

Ft.  Belvoir 

Total 

Privates 

166 

143 

309 

Non-coms 

50 

9 

59 

5.  Analysis 

a.  Item  Analysis  Croup 

For  the  purpose  of  item  analyzing  the  three  tests  used, 
it  was  desired  to  combine  the  installation  samples,  since  the  result- 
ing keys  would  be  used  irrespective  of  Installation.  The  following 
analyses  were  performed  in  order  to  determine  the  most  appropriate 
statistical  method  for  combining  the  Associate  Bating  scores  from  the 
two  installations. 

Critical  ratios  were  computed  comparing  mean  Associate  Eating 
scores  obtained  by  soldiers  at  Ft.  Dix  with  those  at  Ft.  Knox.  The 
differences  between  installations  were  significant  at  the  1$  level  of 
confidence. 

Variance  ratios  were  computed  for  these  data  to  test  the  signifi- 
cance of  differences  in  variability  for  Associate  Bating  scores. 
Differences  in  variance  were  significant  at  the  2$  level  of  confidence 
for  non-comB  at  Knox  versus  non-coms  at  Dix.  This  significant  differ- 
ence in  variability  makes  ambiguous  the  interpretation  of  tests  of 
significance  for  mean  differences  reported,  since  significant  critical 
ratios  between  means  may  arise  because  of  differences  in  variability. 

The  following  tables  present  data  from  which  the  above  interpre- 
tations were  made. 
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Table  3,  Critical  Ratios  and  Variance  Ratios  for  lasting  Differences 
in  Mean  Associate  Eating  Scores 

A.  Privates 


Port  Dix 

Pori  Kuo* 

K 

247 

139 

Mean 

79.16 

72.44 

Standard  Deviation 

4.79 

4.02 

Critical  Ratio 

14. 6*** 

Variance  Ratio 

1.42* 

B. 

Ron-commissioned  Officers 

Port  Dix 

Port  Knox 

K 

90 

137 

Mean 

80.44 

72.30 

Standard  Deviation 

3.94 

5.12 

Critical  Eatio 

13.6*** 

Variance  Ratio 

1.68** 

***  Significant  at  the  1$  level  of  confidence 
**  Significant  at  the  2$  level  of  confidence 
* Significant  at-  ths*  1 £7*  level  of  confidence 


On  the  "basis  of  the  preceding  analyses,,  the  "best  procedure  for 
combining  the  two  installations  seemed  to  "be  conversion  of  Associate 
Rating  raw  scores  to  standard  scores  within  each  installation  before 
pooling  the  two.  Although  about  950  men  had  been  tested  on  each  of  the 
three  experimental  tests  administered,  attrition  in  the  number  of  cases 
had  occurred  as  a result  of  improperly  answered  tests  and  also  by  the 
inability  to  secure  criterion  measures  on  some  of  the  subjects.  The 
graphic  item  counts  are  based  on  a sample  of  385  privates  and  228  non- 
commissioned officers.  Prom  the  graphic  item  counts,  bi serial  r*  s 
were  computed  for  each  item  of  each  of  the  three  tests  administered. 

The  method  by  which  these  were  computed  was  analagous  to  that  used  for 
the  cadet  officer  sample. 

The  Associate  Rating  cii  n»  was  normalised  by  dividing  the 
distribution  into  equal  frequency  eighths,  and  assigning  the  standard 
score  equivalent  of  the  midpoint  of  each  eighth  in  a normal  distribu- 
tion to  the  criterion  scores  within  that  eighth. 

Graphic  item  counts  for  the  three  projective  tests  were  obtained 
separately  for  the  samples  of  privates  and  non-commissioned  officers. 
Each  of  these  two  samples  had  been  fractionized  into  eight  subsamples 
of  equal  frequency,  after  first  arranging  the  cases  in  descending 
order  according  to  their  Associate  Rating  standard  scores. 
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While  the  seme  general  item  analysis  procedure  was  followed  for 
the  three  tests,  somewhat  different  techniques  for  dichotomizing  the 
item  responses  were  necessary  for  the  different  tests.  lor  the  method 
of  dichotomizing  responses  used  for  the  different  teste,  see  Section  B, 
part  6,  under  cadet  officers. 


Scoring  keys  were  developed  for  those  tests  with  promising  valid- 
ity eased  on  the  item  analysis  results.  ItemB  were  selected  whose 
"biserial  r's  were  significant  at  the  10$  level  of  confidence.  The 
value  of  hiserial  r necessary  for  significance  was  determined  from 
the  standard  error  of  hiserial  r computed  from  the  following  formulae's 


SB 


his 


**-§**—  r*bis 

v/ir 


Significant  biserial  r's  are  a function  of  the  percentage  of  cases  in 
each  dichotomized  group,  as  well  as  the  confidence  level  adopted.  Tor 
a 50$  dichotomous  split,  an  r of  .105  was  required  for  significance 
at  the  10$  level  of  confidence  in  the  private's  sample.  For  the  non- 
commissioned sample,  an  r of  .137  was  required  for  significance  at 
this  level  of  confidence. 


One  key  was  developed  for  the  Picture  Interpretation  Test  in 
addition  to  those  obtained  as  above.  This  test  was  selected  for 
special  study  in  an  effort  to  discover  the  nature  of  those  items  which 
yielded  significant  biserial  r's.  Two  judges  classified  the  signifi- 
cant items  from  this  test  into  13  categories  suggested  by  the  kinds  of 
pictures  which  produced  significant  responses.  Those  non-significant 
items  whose  content  fitted  the  classification  scheme  adopted  were  also 
placed  into  these  categories.  It  was  reasoned  that  if  the  classifications 
used  were  indicative  of  real  relationships  between  item  content  and 
criterion,  non-Bignificant  items  in  the  SBme  classification  might  show 
correlations  with  the  criterion  having  the  same  direction  as  that  found 
for  the  statistically  significant  items  classified  in  the  same  category. 

To  check  on  the  extent  to  which  non-significant  items  were  pre- 
dicted with  correct  signs  for  the  various  categories,  the  following 
analysis  was  performed.  The  proportions  of  positive  and  negative 
biserial  r's  among  all  non-significant  items  classified  were  determined. 
Likewise  the  proportion  of  positive  or  negative  items  allocated  to  each 
category  was  determined.  The  differences  between  proportions  in  each 
category  and  in  the  total  were  then  tested  for  significance.  If  these 
differences  were  significant,  it  was  concluded  that  the  classification 
of  items  within  these  categories  was  meaningful.  * By  this  procedure,  an 
item  classification  key  of  89  items  was  developed.  The  item-analysis 
key  for  this  test  contained  99  items. 

Certain  trainees  had  been  eliminated  from  the  item  analysis 


'lKelley,  T.L.,  Fundamentals  of  Statistics.  Cambridge?  Harvard,  1947, 
p.  375. 
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because  they  lacked  Associate  Bating  scores  as  a result  of  haring  "been, 
dropped,  for  various  reasons,  from  the  Leader’s  Course.  There  were 
25  such  cases  who  had  complete  sets  of  tests.  Mean  scores  were 
obtained  for  this  group  on  the  Picture  Interpretation  Test  and  the 
Picture  Pill-In  Test  by  scoring  their  papers  with  the  item  analysis 
keys  developed  as  described  previously. 

It  was  desired  to  compare  the  mean  score  for  dropouts  to  the  mean 
score  of  the  total  item  analysis  group.  It  was  postulated  that,  if  a 
positive  correlation  existed  between  criterion  and  test  scons,  drop- 
out mean  scores  on  the  tests  would  be  significantly  lower  than  the 
means  of  the  item  analysis  group.  This  assumes  that  dropouts,  had 
they  been  rated,  would  have  received  relatively  low  Associate  Eating 
scores. 

In  order  to  estimate  the  mean  test  scores  of  the  item  analysis 
group,  it  was  necessary  to  score  their  papers  with  the  item  analysis 
keys.  Bather  than  scoring  test  papers  for  the  entire  sample  of  385 
privates,  50  cases  were  selected  at  random  from  this  group.  A 
stratified  random  sampling  technique  was  used  since  the  385  privates 
had  been  fractionated  into  eighths  on  the  basis  of  Associate  Eatings 
for  item  analysis  purposes  (see  page  9 )»  Furthermore,  for  both  this 
group  and  the  dropouts,  a different  keying  of  responses  was  used  to 
simplify  scoring  than  that  used  later  for  the  cross-validation  sample. 
Since  the  keying  of  items  is  arbitrary,  the  two  keying  methods  used 
result  in  scores  which  differ  only  by  a constant. 

For  testing  the  significance  of  the  differences  between  means  of 
the  dropout  and  item  analysis  group,  t,  tests  were  computed.  The 
standard  error  for  the  difference  between  means  was  adjusted  in  the 
£ formula  to  take  account  of  the  use  of  a stratified  random  sample. 

b.  Cross-Validation  Croup 

Using  the  item-analysis  keys,  the  Picture  Fill-In  and  Picture 
Interpretation  Tests  were  scored  for  the  cross-validation  sample.  The 
Picture  Interpretation  Test  was  also  Bcored  for  the  item  classification 
key  described  above.  The  Leader’s  Self -Description  Blank  was  scored 
by  using  the  key  furnished  by  the  Personnel"  Research  Section. 

Reliability  coeff ants  were  computed  for  the  Picture  Interpre- 
tation and  Picture  Fill-In  tests.  The  method  of  computation  used  was 
the  correlation  between  scores  from  odd-and-even  numbered  items, 
augmented  by  the  Spearman-Brown  prophecy  formula  to  estimate  the 
reliability  of  the  whole  test. 


1 McHemar,  Q. , Psychological  Statistics.  John  Wiley  and  Sons,  Inc.: 
Hew  York.,  194S,  pp„  333-336. 


Validity  coefficients  were  computed  for  the  cross-validation 
sample.  Scores  on  each  of  the  three  teats,  obtained  as  described 
above,  were  correlated  with  the  Associate  Bating  criterion.  The 
correlations  were  computed  separately  for  the  Ft.  Jackson  and  Ft. 

Belvoir  samples,  and  for  the  two  combined. 

Leforc  combining  the  two  installations  into  a total  sample,  it 
was  advisable  to  test  whether  mean  Associate  Bating  scores  for  the 
two  installations  differed  significantly.  A critical  ratio  was  com- 
puted testing  the  significance  of  this  difference.  If  this  ratio  were 
not  statistically  significant,  Associate  Bating  scores  would  not  be 
converted  to  standard  scores  for  computation  of  the  validity  coeffi- 
cients. 

To  test  whether  the  validity  coefficients  differed  significantly 
for  the  two  installations,  critical  ratios  were  computed  for  the 
difference  between  two  sample  correlation  coefficients.  An  r to  z 
transformation  was  made  prior  to  this  statistical  test. 

Two  multiple  correlation  coefficients  were  computed,  by  the 
Wherry-Doolittle  method,  with  Associate  Batings  as  the  criterion 
variable  in  both  cases,  and  as  the  predictor  variables  (l)  Picture 
Intespretation  Test  and  Picture  Fill-In  Test,  and  (2)  Picture 
Interpretation  Test,  Picture  Fill-In  Test,  and  Leaders1  Self-Description 
Blank.  Only  those  trainees  who  had  completed  all  three  tests  and  had 
received  an  Associate  Eating  were  utilized  for  this  analysis. 
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IT.  HESULTS 


A.  Besults  with.  Cadet  Officers 

1.  Picture  Interpretation  Test.  1949  (M  AGO  PET-1775) 

a.  Item  Analysis1 

The  proportion  of  items  with  statistically  significant 
validities  failed  to  exceed  chance  expectancy,  indicating  a lack  of 
validity  for  the  test  with  this  sample.  A scatterplot  was  prepared 
showing  the  relationship  of  the  obtained  validity  coefficients  of 
G-roup  A (H  - 211)  vs*  the  corresponding  coefficients  of  Grorp  B 
(U  - 223).  The  correlation  between  the  two  sets  of  validity  coeffi- 
cients was  approximately  zero,  indicating  little  consistency  of  item 
validity  from  sub-sample  to  sub-sample.  ThiB  finding,  together  with 
the  low  proportion  of  significantly  valid  items,  strongly  suggests 
that  the  test  does  not  possess  sufficient  validity  for  the  prediction 
of  leadership  with  West  Point  cadets. 

A qualitative  investigation  of  those  items  for  which  combined 
validities  exceeded  chance  expectancy  did  not  yield  logical  categories 
or  trends  which  were  considered  to  be  psychologically  meaningful. 

Table  4 shows  the  distribution  of  the  item  validities  in  the 
Picture  Interpretation  Test. 

b.  Cross-Validation 

To  substantiate  the  evidence  from  the  item  analysis,  the 
validity  of  the  Picture  Interpretation  Test  was  estimated  by  computing 
the  correlation  coefficient  between  the  scoring  keys  derived  on  the 
item  analysis  samples  and  the  Aptitude  Eating  leadership  criterion. 

For  Group  A (N  - 210)  the  scoring  key  yielded  a correlation  coefficient 
of  .12.  Por  Group  B (B  - 222)  the  correlation  coefficient  was  found 
to  be  .07,  indicating  the  lack  of  appreciable  validity  of  this  test  for 
these  samples. 


Item  validities  nave  been  reported  in  detail  for  each  of  the  teete  in 
tables  included  in  regular  monthly  progress  reports  submitted  to  the 
Department  of  the  Army  during  the  course  of  the  study.  Slightly 
different  E5b  from  sample  to  sample  and  from  test  to  test  are  the 
result  of  incomplete  data  on  a few  cases  in  the  sample  tested. 
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Table  4.  Distribution,  of  Item  Validities  in  the  Picture  Interpretation 
Test  for  Two  West  Point  Samples. 

Group  A Group  B 

H =211  H - 223 


r 

f 

r 

f 

.00-. 04 

172 

o 

t 

8 

• 

152 

•05— .09 

100 

.05-. 09- 

140 

.10-. 14 

87 

.10-. 14 

166 

.15-. 19 

38 

.15-. 19 

45 

.20-. 24 

16 

.20-. 24 

15 

.25-. 29 

11 

.25— . 29 

11 

.30-. 34 

4 

.30-. 34 

2 

. 35— . 39 

1 

.35-. 39 

0 

Total 

4?9  items* 

.40-. 44 

0 

.45— .49 

1 

Toted 

432  items 

* The  total  number  of  items  for  which  it  was  possible  to  compute 
validity  coefficients  was  429  in  Group  A,  since  three  items  (no.  184, 
344,  and  362)  yielded  no  responses  in  one  category  of  the  dichotomy. 
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2.  Army  Picture  Story  Seat,  Series  B,  1950.  Syracuse  University 

gigs &. 

a.  Item  Analysis 

The  item  analysis  of  the  Army  Picture  Story  Test  revealed 
only  a chance  proportion  of  statistically  significant  1 less.  The 
scatterplot  between  the  validity  coefficients  in  Group  A (B  - 213)  and 
Group  B (H  =t  222)  for  the  Army  Picture  Story  Test  indicated  a near  zero 
relationship,  suggesting  little  consistency  of  item  validity.  However, 
as  in  the  case  of  the  Picture  Interpretation  Test,  the  subjects  in  the 
two  samples,  Group  A and  Group  B,  did  respond  similarly  to  individual 
items  indicating  a marked  degree  of  inter-sample  consistency. 

A qualitative  analysis  of  the  significant  items  of  the  Army  Picture 
Story  Test  also  failed  to  disclose  meaningful  categories  or  trends  in 
terms  of  postulated  leadership  characteristics. 


Table  5 shows  the  distribution  of  item  validities  in  the  Army 
Picture  Story  Test. 

t 
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Of  the  600  possible  patterns  of  response  in  the  Army 
Picture  Story  Test.  30$  of  the  patterns  showed  significantly  high 
criterion  relationships  at  the  20$  level  of  confidence  in  Group  A 
(E  _ 213).  The  percentage  of  significant  patterns  may  not  be  beyond 
chance  expectations  because  of  inter-pattern  correlation.  However, 
the  degree  to  which  the  relationships  among  patterns  affect  the 
number  of  patterns  appearing  to  possess  significant  validities  is 
impossible  to  determine.  Thus,  a scoring  key  was  constructed  on  the 
basis  of  the  significant  patterns  by  assigning  a weight  of  o-l  to 
patterns  with  positive  validity  at  the  20$  level  of  confidence,  and 
-1  to  patterns  with  negative  validity  at  this  level. 

c„  Cross  Validation 


The  validity  of  the  item  analysis  keys  of  the  Army 
Picture  Story  Test  was  estimated  by  calculating  the  Pearson  product- 
moment  correlation  coefficient  between  test  scores  and  the  Aptitude 
Eating  leadership  criterion.  Group  A (a  n 210)  and  Group  B (S  r 232) 
yielded  validity  coefficients  of  -.08  and  -.05  respectively,  indicat- 
ing essentially  zero  validity  for  the  test  with  these  samples.. 


1 Validities  of  each  pattern  have  been  reported  in  regular  progress 
reports  submitted  to  the  Department  ot  the  Army  during  the  course 
of  the  study. 
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Table  5.  Eistri  "button  of  Item  Yell dl ties  in  the  Army  Picture  Story 
Test  for  Two  West  Point  Samples. 

Group  A Group  B 


00-.04 

144 

. .00-. 04 

156 

05-.Q9 

79 

.05-. 09 

69 

10”. 14 

49 

.10”. 14 

45 

15-. 19 

18 

.15-. 19 

20 

20-.24 

7 

.20-. 24 

6 

25". 29 

1 

.25-. 29 

1 

30- . 34 

1 

.30-. 34 

1 

55- ,39 

1 

. 35-  o 39 

0 

Total 

300  items 

.40-. 44 

0 

.45-. 49 

1 

.50-. 54 

0 

.55-. 53 

o 

. 60- » 64 

A 

Total 


500  items 
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Using  the  key  derived  "by  the  pattern  analysis  on  Group  A,  the 
scores  of  Group  B (H  - 222)  were  calculated.  The  Pearson  product- 
moment  correlation  coefficient  between  the  test  scores  and  the 
Aptitude  Eating  leadership  criterion  was  .01,  indicating  the  lack  of 
validity  in  the  pattern  key  for  this  sample. 

3.  Picture  Fill-In  Test.  Second  Form.  1949  (DA  AGO  PET-1726) 
a.  Item  Analysis 

The  proportion  of  statistically  significant  items  failed 
to  exceed  chance  expectancy.  The  scatterplot  between  the  correlation 
coefficients  for  Group  A (E  = 213)  and  Group  B (B  - 223)  indicated 
«5*Pr03:iIllB'fcely  a zero  relati onship  for  the  Picture  Fill-In  Test.  ThuB, 
again  negative  evidence  was  found  in  regard  to  the  inter-sample  con- 
sistency of  item  validity.  As  in  the  case  of  thB  two  tests  mentioned 
previously,  a qualitative  analysis  of  the  significant  items  found  in 
the  two  samples  failed  to  reveal  categories  of  responses  which  seemed 
to  be  psychologically  meaningful. 

However,  there  waa  evidence  of  considerable  consistency  between 
samples  in  the  proportion  of  individuals  who  responded  in  the  same 
way  to  the  items  in  the  test.  This  same  indication  of  the  inter-sample 
consistency  of  the  responses  was  also  found  for  the  two  tests  discussed 
previously:  The  Picture  Interpretation  Test  and  the  Army  Picture  Story 
Test. 


The  following  table  presents  the  distribution  of  item  validities: 
(See  Table  6). 


b.  Gross- Validation 

Estimates  of  the  validity  of  the  Picture  Fill-In  Test 
were  obtained  by  computing  the  Pearuon  product-moment  correlation 
coefficients  between  the  item  analysis  keys  and  the  Aptitude  Eating 
criterion  of  leadership.  The  validity  of  the  test  for  the  Vest  Point 
samples  wae  found  to  be  appro  sdmately  zero:  in  Group  A (H  - 210)  the 
validity  coefficient  was  .04  end  in  Group  B (N--  222)  the  validity 
coefficient  was  .03. 

4,  Vest  Point  Personal,  inventory,  1949  (DA  AGO  FET--175S) 

a. .^  Item  Analysis 

As  explained  in  Section  III,  B,  5,  a,  (4).  the  scoring 
key  for  this  test  was  provided  by  the  Personnel  Research  Section,  AGO. 

b. .  Gross -Validation 

The  West  Point  Pursuant  Inventory  was  the  only  one  of 
the  four  *3XpbrrmontF>i  teats  which  manifested  appreciable  vnltdity  for 


Table  6.  Distribution  of  Item  Validities 
Test  for  Two  West  Point  Samples. 

Group  A 

H - 213 

in  the  Picture  Pill-In 
Group  B 

H s 323 

r 

f 

r 

f 

.00-. 04 

164 

.00— .04 

161 

.05-.09 

106 

.05-. 09 

119 

.10-. 14 

69 

.10-. 14 

57 

.15-. 19 

32 

.15-. 19 

29 

a 

• 

i 

3 

* 

12 

.20-. 24 

20 

.25-. 29 

4 

.25-. 29 

2 

.30-. M 

3 

.30-. 34 

2 

.35=. 39 

1 

.36-. 39 

0 

,40-. 44 

0 

.40-. 44 

1 

.45-. 49 

0 

.45-. 49 

1 

.50-. 54 

0 

Total 

392  items 

.55-. 59 

0 

,60-. 64 

0 

.65-. 69 

1 

Total 

292  items 

the  West  Point  sample.  The  Pearson  product-moment  correlation  coeffi- 
cient between  the  scores  on  thie  teBt  and  the  Aptitude  Bating  leadership 
criterion  wag  .351,  baaed  on  a sample  of  426  cadets.  This  value  of  .351 
is  a statistically  significant  validity  coefficient,  Bince  a value  of 
.128  is  required  for  significance  at  the  1 level  of  confidence. 
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B.  He  suit  a with  Enlisted  Personnel 

1,  Item-Analysis  Group;  All  Tests 
a,  Item  Analyses 

Table e G,  0,  and.  10  show  the  distribution  of  bis9rial 
r*s  obtained  for  items  in  each  test.  The  sign  of  the  coefficients 
was  disregarded  in  tabulating  these  results,  since  keying  the  res- 
ponses to  each  item  is  only  arbitrary. 

Table  7 summarizes  the  number  of  significant  items  found  for  the 
two  samples  at  the  10$  level  of  confidence  in  each  of  the  three  tests. 

Table  7.  Number  of  Significant  Items  for  Three  Experimental 
Personality  Tests, 


Total  No.  of  Items 

Number  of 
Significant  ItemB 

Picture 

1. 

Interpretation  Test 
Privates  (N  - 386) 

432 

99 

2. 

Non-Coms  (N  rs  228) 

432 

53 

Picture 

Fill-In  Test 

1. 

Privates 

392 

130 

2. 

Non-Coms 

392 

38 

Picture 

Story  Teat 

1. 

Privates 

300 

48 

2. 

Non-Coms 

300 

45 

Prom  this  table,  it  is  apparent  that  more  significant  items  are 
found  for  the  sample  of  privates  thmi  is  true  for  non  commissioned 
officers.  By  inspecting  the  individual  items,  it  is  also  apparent 
that  those  items  found  significant  in  the  private's  sample  generally 
are  not  found  significant  in  the  non-commissioned  officer  sample. 

The  results  indicate,  furthermore,  that  the  Picture  Fill-In  Test  and 
Picture  Interpretation  Test  are  functioning  in  the  sample  of  privates 
at  a level  appreciably  hotter  than  chance  expectancy.  Since  the  10$ 
level  of  confidence  was  adopted,  chance,  on  the  average,  would  result 
in  approximately  39  significant  itemB  for  the  Picture  Fill-In  Test, 

42  for  the  Picture  Interpretation  Test,  and  30  forj&he  Army  Picture 


Story  Test,  assuming  no  correlation  among  the 


each  test. 
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Table  8.  Distribution  of  Item  Validities  in.  the  Picture  Interpretation 
Teet  for  Two  Unlisted  Samples 

A.  Privates  B.  Eon-Commissioned 

Officers 

ys  - MS  N = 228 


r 

f 

r 

f 

.00-. 04 

152 

• 

o 

0 

1 

k 

168 

.05-. 09 

131 

.05-. 09 

107 

.10-. 14 

90 

.10-. 14 

88 

.15-. 19 

44 

.15-.  19 

45 

.20—34 

13 

.20-. 24 

15 

.35-. 29 

1 

.25-. 29 

2 

3 

• 

1 

8 

* 

1 

.30-. 34 

2 

Total 

432  items 

.35-. 39 

1 

.40-. 44 

2 

.45-. 49 

0 

.50-. 54 

1 

.55-. 59 

0 

.60-. 64 

0 

.65-. 69 

0 

.70-. 74 

1 

Total 

432  items 
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Table  9.  Distribution  of  Item  Yalidities  in  the  Army  Picture  Story 
Test  for  Two  Enlisted  Samples. 

A.  Privates  S.  Non-Commie cloned  Officers 

N B 385  H ss  228 


r 

f 

r 

f 

.00-. 04 

132 

.00— .04 

112 

* 0O~~  *0S 

105 

.05-.09 

91 

.10-. 14 

51 

.10-. 14 

59 

.15-. 19 

11 

.15-. 19 

29 

• 

I 

ft 

* 

1 

.20-. 24 

7 

Total 

300  itemB 

. 25- . 29 

2 

Total 

300  items 

Table  10.  Distribution  of  Item  Yalidities  in  the  Picture  Pill-In  Test 
for  Two  Enlisted  Samples. 

A.  Privates  B.  Uon-CommiBsioned  Officers 

H m 385  ' H - 228 


r 

f 

r 

f 

.00-. 04 

140 

,00-. 04 

176 

.05- .09 

99 

.05-. 09 

109 

.10-.  14' 

71 

.10-. 14 

72 

.15-. 19 

42 

.16-. 19 

27 

rs%  nA 

* rsj  — « tj~* 

30 

,20-. 24 

5 

.25 -.29 

0 

.20-. 29 

2 

.30-. 34 

1 

.30-.  34 

1 

Total 

392  it  erne 

To  tel 

392  i terns 
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lor  the  Amy  Picture  Story  Test,  the  number  of  significant  items 
was  about  1.6  times  what  would  he  expected  on  the  hasis  of  chance. 

Ehis  figure  is  not  large  enough  to  warrant  concluding  that  non-chance 
relationships  are  involved,  particularly  since  the  assumption  of  non- 
correlation among  items  is  untenable.  It  is  possible  that  a pattern 
analysis  sight  reveal  more  convincing  evidence  for  the  validity  of 
this  test,  in  view  of  the  triad  form  of  item  responses.  Experience 
with  the  pattern  analysis  performed  for  the  cadet  officer  sample  did 
not  encourage  a parallel  anaLysis  for  the  enlisted  sample.  In  view 
of  the  relatively  low  number  of  significant  items  found  for  thiB 
test,  it  was  not  administered  to  the  cross-validation  sample. 

for  the  Picture  Interpretation  Test,  about  2.3  times  as  many 
items  were  found  significant  for  the  sample  of  privates  as  would 
he  expected  on  the  basis  of  chance.  Jor  the  Picture  Pill-In  Test, 
this  figure  was  about  3.3.  These  tests  were  selected  to  be  adminis- 
tered to  the  cross-validation  sample,  since  the  evidence  is  suggestive 
of  validity. 

b „ Comparison  of  Item  Analysis  Group  and  Dropouts 

Table  II  shovrs  i tests  for  the  significance  of  the  mean 
differences  between  dropouts  and  the  item  analysis  group.  It  had 
heen  expected  that  dropouts  would  show  lower  mean  test  Bcores  for  the 
Picture  Pill-In  Test  and  for  the  Picture  Interpretation  Test.  Such 
is  the  case,  and  furthermore,  this  difference  is  significant  at  the 
5$  level  of  confidence  for  the  Picture  Interpretation  Test,  and  at 
the  1$  level  of  confidence  for  the  Picture  Pill-In  Test. 

3.  Cross-Validation 

a.  Eeliabilities  and  Related  Statistics 

Table  12  summarizes  reliabilities  and  related  statistics 
calculated  from,  the  cross-validation  sample  for  the  Picture  Pill-In 
Test  and  the  Picture  Interpretation  Test. 

These  results  indicate  that  Bcores  for  these  instruments  are 
sufficiently  reliable  to  be  useful  for  large-  scale  classification 
purposes. 

b. .  Validities 


( I ) I terf'  Anal  y si  s Key s 

Table  13  summarizes  validity  coefficients  for  the 
keys  developed  from  item  analysis  of  the  Picture  Pill— In  and  Picture 
Interpretation  Testa.. 
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Table  11.  Significance  of  Mean  Differences  between  Dropouts  and 
Item  Analysis  Group  for  Picture  Fill-In  and  Picture 
Interpretation  Tests 


Picture 

Pill-In 

Picture 

Interpretation 

Mean  of  Dropout  Group 

68.8 

60.4 

S.  D.  of  Dropout  Group 

21.1 

9.6 

H of  Dropout  Group 

25 

25 

Mean  of  Item  Analysis  Group 

90.6 

56.2 

S.  D.  of  Item  Analysis  Group 

21.6 

11.8 

U of  Item  Analysis  Group 

50 

50 

t Ratio  between  groups 

** 

4.26 

2.44* 

* Significant  at  5$  level 
**  Significant  at  1#  level 


Table  12.  Eeliabilities  and  Belated  Statistics  Estimated  from  Cross- 
Validation  Samples  at  Leaders'  Schools  for  the  Picture 
Interpretation  and  Picture  Pill-In  Tests 

Picture  Interpretation  Test 


S 

Mean 

S 

S.l.  of 
Meas. 

Odd-Even 

Bel. 

Spearman 

Brown 

Pt.  Jackson 

158 

61.34 

10.31 

106.25 

4.50 

.81 

.90 

Ft.  Belvoir 

138 

60.51 

9.58 

91.80 

5.58 

.66 

o 

CO 

Total  Sample 

296 

60.95 

9.97 

99.47 

5.08 

.74 

.85 

Picture  Pi 11-In  Test 

s 

Mean 

S 

*2 

a 

S • of 
Mom,  _ 

Odd-Even 

Eel. 

Spearman 

Brown 

Ft.  Jackson 

145 

97.39 

12.53 

157.08 

5.46 

.81 

.30 

Ft.  Belvoir 

133 

91.91 

18.61 

346.25 

5.58 

.91 

.96 

Total  Semple 

278 

94.77 

15.85 

251.28 

6.34 

.84 

.91 
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lable 13.  Validity  Coefficients  for  the  Picture  Pill-In  and 

Picture  Interpretation  Tests  in  the  PHvates1  Saarple. 


Pt.  Jackson 

Pt.  Belvoir 

Combined 

Picture  Pill-In  Test 


r 

.19* 

.24** 

.19** 

H 

124 

132 

256 

M 

97.0 

91.3 

94.1 

S.  D. 

13.2 

18.8 

16.7 

Picture  Interpretation  Test 
r 

.36** 

.16 

** 

.25 

E 

. 133 

136 

269 

M 

61.3 

60.4 

60.8 

S.  D. 

9.9 

9.7 

9.8 

* Significant  at  the  5$  level  of  confidence. 

**  Significant  at  the  V&  level  of  confidence. 
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All  correlations  reported  are  significantly  different  from  zero 
at  the  5$  level  of  confidence  with  the  exception  of  r - .16  for  the 
Ft.  Belvoir  sample  on  the  Picture  Interpretation  Test.  This  correlation 
approaches  significance  at  the  5$  level  of  confidence  so  closely  that 
it,  too,  is  unlikely  to  "be  considered  a sampling  fluctuation  from  a 
correlation  of  zero. 

Differences  in  mean  criterion  scores  for  the  two  installations 
were  not  statistically  significant  at  the  5$  level  of  confidence. 

On  the  basis  of  this  result,  validity  coefficients  were  computed 
using  Associate  Eatings  as  raw  scores  rather  than  converting  them  to 
standard  scores. 

Tests  of  the  significance  of  the  difference  between  sample 
correlation  coefficients  were  performed  in  order  to  compare  validity 
coefficients  at  the  two  installations.  These  results  fail  to  reveal 
a difference,  significant  at  the  level  of  confidence,  between  the 
validity  coefficients  at  the  two  installations.  The  tests  seem  to  be 
about  equally  valid  in  both  of  the  cross-validation  samples. 

(2)  Item  Classification  Key  for  the  Picture  Interpretation 

Test. 


Table  14  presents  validity  coefficients  for  the  item 
classification  key,  for  the  item  analysis  key,  for  the  combination 
of  the  two,  and  the  correlation  between  the  item  analysis  and  item 
classification  keys.  The  criterion  used  for  the  validity  coeffi- 
cients was  that  of  Associate  Eatings. 

The  fact  that  the  item  classification  key  correlates  signifi- 
cantly with  the  item  analysis  key  is  interpreted  to  mean  that  these 
categories  have  an  appreciable  degree  of  internal  consistency.  The 
failure  of  the  items  classified  to  correlate  significantly  with  the 
criterion  is  an  indication  of  their  consistent  lack  of  validity  for 
this  criterion  even  in  a cross-validation  sample. 

(3)  Multiple  Correlation 

Table  15  shows-  the  multiple  correlations  for  the  total 
cross-validation  sample,  and  the  intercorrelations  from  which  the 

— » t . 
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The  standard  errors  of  these  E's  are  such  as  to  render  no  combi- 
nation of  the  tests  appreciably  superior  in  prediction  to  another, 
nor,  indeed  to  the  Leaders  Self-Description  Blank  alone. 


\ 
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Table  14.  Validity  Coefficients  of  the  Item  Analysis  and  Item 

Glassification  Kaye  for  the  Picture  Interpretation  Teat 
and.  the  Correlation  between  Keys. 

Privates 


It.  Jackson 

Jt.  Belvoir 

Combined 

Item  Classification 
Key  Alone 

-.01 

-.18 

-.06 

H 

134 

136 

270 

Item  Classification  Key 

and  Item  Analysis  Key 

.25 

.06 

.16 

N 

133 

13-5 

367 

Item  Analysis  Key  Along 

.36 

.16 

.25 

H 

133 

136 

269 

Item  Analysis  Key  vs. 

Item  Classification  Key 

.54 

.26 

.40 

H 

158 

138 

296 
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Table  15.  Intercorrelations  among  Predictor  Variables*  Correlations 
with  Associate  Bating  Criterion,  and  Multiple  Correlations 
for  Cross-Validation  Sample 


$f  - 234  cases  in  cross-validation  sample  with  all  4 measures) 


Assoc. 

Bating 

P.J.I. 

P.I.T. 

L.S.D.B. 

Associate  Bating  (0) 

C 5 • • 

Picture  Fill-In  (1) 

.17 

• • « • 

Picture  Interpretation  (2) 

.23 

' .18 

• m • • 

Leaders1  Self  Description 
Blank  (3) 

.30 

.34 

.37 

Multi-pie  Correlations 
*0.13  = 

*0.123  ~ *32 


v.  conclusions 


The  conclusions  to  "be  derived  from  the  results  described  in  the 
preceding  section  will  be  presented  below  in  the  sequence  correspond- 
ing to  the  objectives  set  forth  in  Section  11= 

A.  Validity  of  the  three  objective  projective  tests  for  meas- 
uring leadership  performance  of  Vest  Point  cadets. 

1.  In  none  of  the  three  teats  does  the  aggregation  of  items 
correlate  with  the  leadership  criterion  (Aptitude  Bating)  appreciably 
beyond  chance  expectation.  This  follows  from  the  fact  that  the  number 
of  items  found  statistically  significant  at  a given  level  of  confidence 
is  no  greater  than  the  number  which  would  manifest  that  degree  of 
validity  through  sampling  fluctuations  about  a true  validity  of  zero. 

2.  When  scoring  keys  are  developed  independently  for  each 
of  two  representative  samples  by  keying  items  whose  individual  valid- 
ity coefficients  appear  to  be  statistically  significant,  thero  is 
practically  no  better  than  chance  correspondence  in  the  items  comprised 
within  the  two  keys.  Hence,  it  can  be  inferred  that  there  1b  inade- 
quate consistency  of  the  scoring  keys  from  sample  Jo  sample.  Further- 
more. each  of  the  two  keyB  for  each  test  correlates  approximately  i££0 
with  the  leadership  criterion  in  its  c ro s s -vail dati on  sample. 

3.  Duplicating  previous  findings  of  the  Personnel  Research 
Section,  the  West  Point  Personal  Inventory  is  found  to  correlate 
appreciably  and  significantly  with  the  leadership  criterion.  In 
addition  to  demonstrating  the  validity  of  the  test,  this  indicates 
that  the  criterion  is  predictable. 

Discussions  This  investigation  failed  to  reveal  a stable  and 
valid  method  of  keying  the  responses  to  three  objective  projective 
tests  so  as  to  predict  leadership  among  West  Point  cadets  with  better- 
thaa-chsaca  efficiency.  This  failure  cannot  be  attributed  totally  to 
inadequacy  of  the  leadership  criterion,  the  Aptitude  Hating,  for  it 
is  predictable  from  scores  on  the  West  Point  personal  Inventory. 

B.  Validity  of  three  objective  projective  tests  for  measuring 
leadership  performance  of  Amy  non-commissioned  personnel. 

1.  a.  Among  privates  enrolled  at  Leaders®  Schools,  the 
Army  Picture  Story  Test  failed  to  yield  a proportion  of  items  which 
correlates  with  the  Associate  Rating  appreciably  beyond  chance 
expectation.  However,  in  both  the  Picture  Interpretation  Test  and 
the  Picture  Pi 11- In  Test,  considerably  more  than  10  per  cent  of  the 
items  were  valid  at  leapt  at  the  lOJs  level  of  confidence.  Hence  it 
was  possible  to  develop  a scoring  key  for  each  of  these  two  tests 
with  some  expectation  that  the  resulting  scores  would  correlate 
appreciably  with  the  leadership  criterion  in  new  samples. 
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b.  isaB S non-commissioned  officers  efflrcll.eA  i fi  Las4sili 
Schools,  none  s£  tea  teiss.  teats  yields  aa  sgflrsKftttflfl  si  items.  kMs& 
correlates  with  ite.  leadership  criterion  (JaflSfiifite.  Bating)  appreciably 
beyond  chance  expectati  on,  This  follows  from  the  fact  that  the  number 
of  items  found  to  be  statistically  significant  at  a given  level  of 
confidence  is  no  greater  than  the  number  which  would  manifest  that 

of  validity  through  chance  fluctuations  about  a true  validity 
of  zero. 

2,  a.  When  these  keys  for  the  Picture  Interpretation  and 
Picture  Fill-In  Tests  are  applied  in  new  samples  of  privates  enrolled 
in  Leaders • Schools,  the  Spearman-Brown  reliabilities  of  the  scores 
are  .85  and  .91,  respectively. 

The  correlation  of  these  scores  with  the  Associate  Bating 
criterion  yields  a validity  coefficient  of  .25  for  the  Picture 
Interpretation  Test  and  ,19  for  the  Picture  Pill-In  Tests  these 
validity  coefficients  are  significant  at  beyond  the  1$  level  of  confi- 
dence. 

These  keys  also  discriminate,  on  the  average,  between  trainees 
who,  for  various  reasons  (including  lack  of  leadership  potential), 
are  separated  early  in  the  program  and  thoso  who  are  graduated. 

b.  (Bo  cross-validation  was  undertaken  with  non-commissioned 
officers,  in  view  of  the  negative  results  of  the  item  analysis.) 

3.  a.  The  Leaders"  Self -Description  Blank  is  found  to  corre- 
late appreciably  and  significantly  with  the  Associate  Eating  criterion 
for  privates.  The  validity  coefficients  of  thiB  and  the  other  two 
tests  do  not  differ  from  one  another  at  the  5$  level  of  confidence. 

b.  The  multiple  correlation  between  the  criterion  and  the 
two  projective  tests  is  not  appreciably  higher  than  the  validity 
coefficient  of  the  Picture  Interpretation  Test  alone.  The  multiple 
correlation  between  the  criterion  and  the  two  projective  tests  plus 
the  Leader s;  Self-Description  Blank  is  not  appreciably  higher  then 
the  validity  coefficient  of  the  last-named  test  alone- 

c.  The  Picture  Interpretation  and  Picture  Fill-In  Teste  are 
virtually  unsorrel ated  with  one  another,.  This,  in  the  light  of  their 
relatively  high  reliabilities,  suggests  considerable  independence  of 
the  factors  measured.  Both  of  the  tests  correlate  somewhat  more  highly 
with  the  Leaders8  Self-Description  Blank,  although  still  at  a level  far 
below  their  reliability  coefficients. 

Discussion;  The  Army  Picture  Story  Test  revealed  evidence  of 
validity  in  neither  the  sample  of  non- commissioned  officers  nor  of 
privates.  The  Picture  Interpretation  Test  and  the  Picture.  Ullml” 
yeat  evidenced  no  validity  in  the  non  commissioned  officer  sample  but 
both  showed  appreciable  and  significant  validity  in  the  sample  of 
privates,. 


It  should  "be  recognized;  in1  this  connection;  that  the  Associate 
Hating  criterion  does  not  necessarily  represent  a total  appraisal  of 
leadership  performance . This  s priori  consideration  is  substantiated 
by  the  fact  that  this  particular  criterion  was  only  slightly  corre- 
lated with  other  assessments  of  leadership  obtained  for  those  personnel 
In  the  leaders8  School.  It  is,  therefore,  conceivable  that  the  validity 
of  these  tests  for  a more  comprehensive  criterion  would  be  considerably 
higher. 

G.  Inferences  concerning  the  general  promise  of  the  objective 
projective  type  of  test. 

The  fact  that  these  tests  correlate  significantly  and  appreciably 
with  a limited  criterion  of  leadership  performance  among  privates,  and 
their  relative  Independence  of  one  another  and  the  Leaders1  Self- 
Description  Blank,  combine  to  suggest  that  we  have  here  a type  of 
Instrument  that  may  constitute  an  Important  contribution  to  the  tech- 
niques for  identifying  potential  leaders.  However,  in  the  light  of 
the  results,  this  statement  must  be  made  with  certain  limitations. 

In  the  first  place,  the  failure  of  the  tests  to  function  with  samples 
of  cadets  and  non-commissioned  officers  indicates  that  they  may  be 
most  effective  with  personnel  of  relatively  little  sophistication  and 
limited  experience.  Secondly,  it  should  be  recalled  that  the  predic- 
tive value  of  the  tests  has  been  shown  only  for  the  Associate  Hating 
criterion;  this  criterion  probably  does  not  reflect  all  aspects  of 
leadership  performance. 

Conversely,  the  positive  results  that  have  been  obtained  with 
these  measures  by  no  means  define  the  limits  of  their  value.  They 
may,  for  example,  correlate  as  well  or  better  with  other  aspects  of 
Army  leadership  performance,  or  with  similar  criteria  in  other  types 
of  leadership  situations,  such  as  under  field  conditions.  Their 
validity  for  other  types  of  performance  involving  personality  factors, 
e.g. . adaptability  to  stress  situations,  is  likewise  in  the  realm  of 
possibility.  At  present,  these  are,  of  course,  moot  points.  But  the 
promise  evidenced  by  these  tests  for  the  limited  criterion  and  situar- 
tions  involved  in  this  study  imply  the  advisability  of  exploring  their 
value  for  other  criteria  and  situations. 

Such  further  work  should,  however,  be  done  with  revisions  of  the 
present  tests.  Many  items  which  failed  to  show  validity  in  this  study 
could  well  be  eliminated,  and  replaced  by  other  items  constructed 
along  lines  which  the  present  study  suggests  are  more  fruitful.  Sug- 
gestions for  such  future  modifications  are  incorporated  in  the  follow- 
ing section. 
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71.  IMPLICATIONS  AND  EEC OMMENDATIONS 


The  major  implication  of  the  present  study  is  that  there  are  at 
least  two  of  these  objective  projective  teste  (the  Picture  Interpreta- 
tion and  Picture  Fill-In  Tests)  which  evidence  validity  for  leadership 
criteria.  let,  the  results  indicate  that  there  is  considerably  more 
work  to  he  done  before  these  instruments  will  have  the  utility  which 
practical  considerations  require. 

For  one  things  the  evidence  is  that  the  tests,  as  presently 
constituted,  are  valid  only  within  a limited  segment  of  personnel  in 
connection  with  whom  leadership  assessments  would  be  made,  namely, 
the  youngest  and  least  experienced  group.  It  behooves  ua  to  consider 
why  this  should  be  the  case  and  what  may  be  done  to  augment  the  range 
of  utility  of  these  instruments. 

For  another  thing,  the  evidence  does  not  suggest  that  these  tests 
should  be  used  to  supplant  or  supplement  the  Leaders5  Self  Description 
Blank,  which  is  currently  available  for  assessing  leadership  potential 
of  enlisted  personnel.  The  reason  for  this  statement  is  that  neituer  of 
the  two  new  tests  correlates  better  with  the  criterion  than  the  Leaders.* 
SalfDeserln ti on  Blank,  nor  is  the  multiple  correlation  of  the  three 
tests  appreciably  higher  than  the  validity  of  the  Leaders8  .Sglf-D.tSgriptipa 
Blank.  However,  it  should  be  noted  that  the  failure  of  the  new  tests  to 
add  appreciably  to  the  validity  of  the  Leaders8  Self  -Description  Blank 
is  not  a function  mainly  of  the  degree  of  overlap  among  the  measures; 
rather,  the  failure  is  primarily  attributable  to  the  relatively  low  (even 
though  significant)  validity  manifested  by  the  new  tests.  It  would  thus 
seem  that  revision  of  these  tests  so  as  to  augment  their  predictive  value 
could  well  result  in  useful  additions  to  current  selection  instruments. 

There  are  therefore  two  dimensions  along  which  revision  should 
proceeds  (1)  broadening  of  the  range  of  personnel  to  which  the  tests 
are  applicable,  and  (2)  intensifying  the  discrimination  power  of  the 
items  within  this  range.  Conceivably,  of  course,  the  two  objectives  may 
not  be  attainable  with  a single  form  of  each  test,  so  that  a different 
form  may  be  required  for  each  of  several  types  of  personnel,  §Lz£s-e 
commissioned  and  enlisted. 


; tori 3 tics  of  the  various  t t.ema  serves  as 
a source  of  hypotheses  for  effecting  these  improvements.  For  example, 
extending  the  effective  range  of  these  tests  to  reach  more  highly 
educated  arid  experienced  personnel  seems  to  require  modification  in 
two  respects;  (i)  the  situations  depicted  should  be  more  appropriate 
to  the  interests  and  vital  experiences  of  such  personnel;  at  present, 
most  of  the  situations  appear  to  be  rather  simple,  socially  and  motiva 
tionally,  making  it  difficult  to  elicit  real  identification  and  projec  - 
tion on  the  part  of  more  sophisticated  subjects;  (2)  in  spite  ox  the 
projective  approach,  there  may  still  bs  too  much  transparency  in  regard 
to  what  constitutes  "right 11  and  !5wrong”  answers;;  the  use  of  more  subtle 
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situatloas  and  responses,  and  possibly  also  the  use  of  forced-choice 
responses,  may  produce  marked  improvements  in  the  suitability  of  such 
tests  for  higher-level  personnel. 

The  latter  point,  namely  forced-choice  coupling  of  response 
?!  t-fimatives,  may  also  be  a fruitful  method  of  improving  the  discrimi- 
nation power  of  the  items  for  any  level  of  personnel,  hut  more  than 
this  should  "be  done  to  attain  this  objectives  new  items  should  he 
prepared  whose  content  is  likely  to  generate  validity.  Clues  from  the 
scrutiny  of  present  valid  and  invalid  items  should  he  used  as  a basis 
for  this  work. 

Among  these  clues  are  the  followings 

A.  For  Picture  Interpretation  Test,  items  depicting  high  or  low 
prestige  activities  seem  more  frequently  to  he  valid.  The  same  is 
true  of  items  portraying  leadership  or  dominance, 

B.  For  Picture  Fill-In  Test,  the  discriminating  responses  seem 

to  he  those  which  reflect  of  extra-puni tiveness  and  intra-punitiveness. 
Social  appropriateness  or  inter-personal  skill  Beems  to  he  another 
dimension  along  which  a number  of  the  items  discriminate. 

The  elimination  of  the  many  non- discriminating  items  from  the 
present  forms  of  both  of  these  tests,  and  their  replacement  by  items 
constructed  along  the  lines  suggested  above  may  well  be  productive 
of  greater  validity. 

Most  of  the  facts  needed  to  effect  modifications  along  lines 
suggested  above  already  are  at  hand.  For  example,  the  types  of  items 
which  seem  most  productive  of  validity  could  readily  be  classified 
beyond  the  point  already  accomplished.  Also,  as  regards  the  forced- 
choice  possibility,  preference  values  (defined  in  terms  of  frequency 
of  response  selection)  and  validity  coefficients  are  available  for 
all  items. 

The  basic  decision  that  must  first  be  made  is  whether  or  not  to 
proceed  further  with  instruments  of  this  type.  It  is  felt  that  the 
evidence  disclosed  in  this  investigation  1b  that  the  Army  possesses 
1 Raitt  two  new  type  tests  which  show  appreciable  validity  for 
leadership  criteria,  while  being,  at  the  same  time,  relatively  inde- 
pendent of  existing  instruments  used  for  leadership  assessment.  These 
considerations  cogently  denote  the  promise  inherent  in  further  explore^ 
tion  and  development  along  thess  lines..  This  is  particularly  true  in 
view  of  the  critical  need  for  techniques  of  leadership  assessment  and 
the  paucity  of  existing  means  for  meeting  this  need.. 


